Data Analytics for Customer Facing Applications Jaideep Srivastava Computer Science & Engineering

advertisement
Data Analytics for
Customer Facing Applications
Jaideep Srivastava
Computer Science & Engineering
srivasta@cs.umn.edu
2/25/2008
1
Presentation Outline
Technology trends
Customer facing applications
Status of CRM efforts
Analytical CRM
Customer segmentation
Customer loyalty
Customer retention
Analytical CRM architecture
Data warehouse
Dimensional data modeling
On-line analytical processing
(OLAP)
2/25/2008
Data mining
Amazon.com: case study in
building customer loyalty
Analytics behind emarketing
Yodlee.com: case study in
web business intelligence
Privacy issues
Conclusion
© Jaideep Srivastava
2
Technology Trends
Internet growth
Faster than any other infrastructure
Data collection
Rapid drop in storage costs
Dramatic improvement in resolution and rate
of data collection ‘probes’
Data analytics
Increasing deployment of warehouses
Major leap forward in data mining
technologies and tools
Becoming possible to really understand what your
customers want – even at the individual level!!
2/25/2008
© Jaideep Srivastava
3
Infrastructure Adoption in the US
120
Millions
of users
60
Radio
0
2/25/2008
1922
TV
Cable
Internet
1950
1980
© Jaideep Srivastava
1995
2000
4
Product Marketing – 75 years ago
• Production – a
la Adam Smith
• You can have
any color as long
as its black –
Ford Motor Co.
2/25/2008
© Jaideep Srivastava
5
Product Marketing - today
Add the spice
of flexibility,
courtesy of
robotics,
computers …
5
2/25/2008
© Jaideep Srivastava
6
New approach to marketing
TO: Finding products that are right for each customer
TURN the process
through 90 degrees
FROM: Finding customers that are right for each product
To achieve this we need to align around
Products: 1 2 3 4 5 …..
•Organization and culture
•Business processes and skill
•Measurement and incentives
•Information management
•Technology
2/25/2008
© Jaideep Srivastava
7
“Mass Customization” –
Mass production
Cheap to produce
Efficient to produce
Uniform features/quality
‘one size fits all’ approach
Optimize production cost
B. Joseph Pine
Customization
Expensive to produce
Inefficient to produce
Customized features
‘tailor made’ approach
Optimize customer
satisfaction
Mass customization
Cheap & efficient to produce
Customized features
‘tailor made’ approach
Optimize production cost & customer satisfaction
2/25/2008
© Jaideep Srivastava
8
Customer Facing Applications
2/25/2008
9
Customer Facing Applications
Consumer marketing
Campaign management
Opportunity management
Web-based encyclopedia, configurator
Market segmentation
Lead generation/enhancement/tracking
2/25/2008
© Jaideep Srivastava
10
Customer Facing Applications
Customer care & support
Incident assignment/escalation/tracking/reporting
Problem management/resolution
Order management/promise fulfillment
Warranty/contract management
Field service support
Work orders, dispatching
Real time information transfer to field personnel via
mobile technologies
2/25/2008
© Jaideep Srivastava
11
Customer Facing Applications
Corporate sales
Contact management profiles and history
Account management including activities
Order entry
Proposal generation
Sales management
Pipeline analysis, e.g. forecasting
Sales cycle analysis
Territory alignment
Roll-up and drill-down reporting
2/25/2008
© Jaideep Srivastava
12
Status of Customer Relationship
Management (CRM) Efforts
2/25/2008
13
Companies are spending megabudgets on CRM
CRM = software + support services
European CRM expenditure = $1.2B + $3.0B = $4.2B*
UK marketing
service industry
growing at 17.4%
to $7.7B
CRM
Relationship marketing
Customer service
Value added programs
Loyalty programs
Culture change
*Hewson Consulting October 2000
2/25/2008
© Jaideep Srivastava
14
But - satisfaction is declining
2/25/2008
© Jaideep Srivastava
15
And - more customers are
complaining
2/25/2008
© Jaideep Srivastava
16
Increasing customer resistance
98% of customer solicitations are irrelevant
82% of individuals would like to block all
marketing access to their own data
Campaign hit rates and customer loyalty
indicators are declining
2/25/2008
© Jaideep Srivastava
17
Consequently
The ‘best’ customers are being over
communicated to
Today’s less valuable customers are not being
developed into tomorrow’s ‘best’ customers
The business potential of the customer base
is not being maximized
2/25/2008
© Jaideep Srivastava
18
Solution: Analytical CRM
CRM = Customer Understanding +
Relationship Management
Analytics helps in Customer Understanding
Analytics = OLAP, Statistical analysis, data
mining, etc.
2/25/2008
© Jaideep Srivastava
19
Example Customer Facing Applications
Helped by Analytical CRM
Customer segmentation
Customer loyalty building
Customer retention/recovery
2/25/2008
© Jaideep Srivastava
20
Customer segmentation
Purpose of segmentation is to identify groups of customers with
similar needs and behavior patterns, so that they be offered
more tightly focused
Products
Services
Communications
Segments should be
Identifiable
Quantifiable
Addressable
Of sufficient size to be worth addressing
Two approaches to segmentation
cluster common characteristics, and then map out behavior
patterns
Separate out behavior patterns, then identify segment
characteristics
2/25/2008
© Jaideep Srivastava
21
Customer base segmentation
Potential business
High
Low
Develop
Retain
Observe &
Incentivize
Low
Care &
Maintenance
Actual business
High
Targeted communication to each segment
2/25/2008
© Jaideep Srivastava
22
Segmentation by value
2/25/2008
© Jaideep Srivastava
23
Express profits as deciles, and
ask questions
1200
1000
800
600
Profit 400
200
0
-200
-400
-600
-800
-1000
-1200
2/25/2008
Should the focus be on
retaining wallet share
from segments 8 – 10?
Or, on gaining from
segments 1 – 4?
Who are these
customers; what do
they look like?
Deciles
Are these worth keeping?
Can we service them with a
lower cost channel?
What can we do to make this
segment profitable?
© Jaideep Srivastava
Middle 60%, either
side of break even.
What can we do about
these?
24
Customer loyalty: close
relationships are more profitable
2/25/2008
© Jaideep Srivastava
25
Relationship intensity and
defection odds
Evidence suggests that customer ‘lock in’ occurs once 4 or more products
are purchased
98.3%
18.1%
Odds of not
defecting
10.2%
1.1%
1
2
3
4
Number of products purchased
2/25/2008
© Jaideep Srivastava
26
A difference of opinion …
Company view
Customer view
90%
70%
32%
2%
Customers are happy
with our customer
service
2/25/2008
We research customer
service needs and
wants as part of our
customer service
improvement
© Jaideep Srivastava
Customer service
needs no
improvement
Customer service
today is better
than ever
27
… and action
Company view
Customer view
98%
43%
7%
We want to develop a
relationship with
our customers
2/25/2008
We want to form and
develop a relationship
with our suppliers
© Jaideep Srivastava
The relationship now is
stronger than 12 months
ago
28
Increasing propensity to buy over
a customer life cycle
Actions which build
relationship warmth
•No-fault service
•“Have a nice day”
•Targeted sales
Customer
relationship
profitability
2/25/2008
© Jaideep Srivastava
29
Loyalty is built through a virtuous
circle of new customer experience
Virtuous circle of customer experience
Provides legitimacy
to offer advice
Superlative
Customer
service
Individualized
and helpful
dialog
Innovative
new products
2/25/2008
Provides legitimacy
to offer advice
Excites the customer
and builds loyalty
© Jaideep Srivastava
30
Lifetime Impact of Customer
Loyalty
“Maximized” customer value
Customer potential
V
A
L
U
E
“Realized” customer value
TIME
2/25/2008
© Jaideep Srivastava
31
Managing Credit-Card Retention in the
Pacific Rim
•Behavioral Propensity Model based Campaigns generate
New Customers
•Selective score-based phone follow-up more than doubles
response
•“Event-driven”(Trans. Vol. & Value) Campaigns to
stimulate initial usage of credit-card.
•Propensity model + “Event-driven” Customer Retention
program identifies likely non-renewers 3 months prior to
renewal, and kicks in usage stimulation program
•Different offers (“Frequent User Club” versus Premium)
being tested
Impact: Over 100% improvement in both Acquisition and
Retention. New market opened up.
2/25/2008
© Jaideep Srivastava
32
Using Negative Events to drive
Positive Sales
Event = “ATM request for cash” is rejected due to lack
of funds.
For credit-worthy customers, unsecured personal loan
is offered by mail or phone the next day!
30% acceptance rate of product offered.
Impact: Significant cross-sales of additional product
Significant reduction in negative reactions
2/25/2008
© Jaideep Srivastava
33
Analytical CRM Architecture
2/25/2008
34
Analytical CRM Loop
Hypothesis
generation
Results
Analysis
Action
2/25/2008
© Jaideep Srivastava
35
Traditional Growth of Functions
in an Organization
THE PRESENT
MULTIPLE CHANNELS & DATA STORES / IMPERSONAL SERVICE
3rd Party
Resellers
Kiosk
ATM
Branch
Data
Impact!
•
IMPERSONAL
•
LOW QUALITY
•
UNINFORMED
•
INCONSISTENT
Outbound
Call Centre
Data
Data
Data
WEB
Fax
Email
WAP
Inbound
Call Centre
l In Confidence
2/25/2008
© Jaideep Srivastava
36
Vision for Customer Driven CRM
THE NEAR FUTURE
MULTIPLE CHANNELS & DATA STORES / PERSONALISED SERVICE
Impact!
2/25/2008
•
PERSONALISED
•
HIGH QUALITY
•
INFORMED
•
CONSISTENT
DATA
© Jaideep Srivastava
37
Canonical Analytics Architecture
metadata
Monitor
&
Integrator
OLAP
Server
other
sources
Operational
DBs
Extract
Transform
Load
Refresh
Serve
Data
Warehouse
Analysis
Query
Reports
Data mining
Tools
Data2/25/2008
Sources
Data
Marts
© Jaideep Srivastava
38
Data Warehouse
2/25/2008
39
Data Warehouse
A decision support database that is
maintained separately from the organization’s
operational database
A data warehouse is a
subject-oriented,
integrated,
time-varying,
non-volatile
collection of data that is used primarily in
organizational decision making.
2/25/2008
© Jaideep Srivastava
40
Data Warehouse - Subject Oriented
subject oriented: oriented to the major
subject areas of the corporation that have
been defined in the data model.
E.g. for an insurance company: customer,
product, transaction or activity, policy, claim,
account, and etc.
operational DB and applications may be
organized differently
E.g. based on type of insurance's: auto, life,
medical, fire, ...
2/25/2008
© Jaideep Srivastava
41
Data Warehouse - Integrated
There is no consistency in encoding, naming
conventions, … among different data sources
When data is moved to the warehouse, it is
converted.
2/25/2008
© Jaideep Srivastava
42
Data Warehouse - Non-Volatile
Operational data is regularly accessed and
manipulated a record at a time and update is done
to data in the operational environment.
Warehouse Data is loaded and accessed. Update
of data does not occur in the data warehouse
environment.
2/25/2008
© Jaideep Srivastava
43
Data Warehouse - Time Variance
The time horizon for the data warehouse is
significantly longer than that of operational
systems.
Operational database contain current value data.
Data warehouse data is nothing more than a
sophisticated series of snapshots, taken as of
some moment in time.
The key structure of operational data may or may
not contain some element if time. The key
structure of the data warehouse always contains
some element of time.
2/25/2008
© Jaideep Srivastava
44
Data Sources
Data sources are often the operational systems, providing
the lowest level of data.
Data sources are designed for operational use, not for
decision support, and the data reflect this fact.
Multiple data sources are often from different systems run
on a wide range of hardware and much of the software is
built in-house or highly customized.
Multiple data sources introduce a large number of issues - semantic conflicts.
2/25/2008
© Jaideep Srivastava
45
Data Cleaning
Important to warehouse clean data
(operational data from multiple sources are
often dirty).
Three classes of tools
Data migration: allows simple data transformation
Data Scrubbing: uses domain-specific knowledge
to scrub data
Data auditing: discovers rules and relationships by
scanning data (detect outliers).
2/25/2008
© Jaideep Srivastava
46
Load and Refresh
Loading the warehouse includes some other
processing tasks: checking integrity constraints,
sorting, summarizing, build indxes, etc.
Refreshing a warehouse means propagating
updates on source data to the data stored in the
warehouse
when to refresh
determined by usage, types of data source, etc.
how to refresh
data shipping: using triggers to update snapshot log table and
propagate the updated data to the warehouse
transaction shipping: shipping the updates in the transaction
log
2/25/2008
© Jaideep Srivastava
47
Monitor
detect changes to an information source that
are of interest to the warehouse
define triggers in a full-functionality DBMS
examine the updates in the log file
write programs for legacy systems
propagate the change in a generic form to
the integrator
2/25/2008
© Jaideep Srivastava
48
Integrator
receive changes from the monitors
make the data conform to the conceptual schema
used by the warehouse
integrate the changes into the warehouse
merge the data with existing data already present
resolve possible update anomalies
2/25/2008
© Jaideep Srivastava
49
Metadata Repository
Administrative metadata
source database and their contents
gateway descriptions
warehouse schema, view and derived data
definitions
dimensions and hierarchies
pre-defined queries and reports
data mart locations and contents
data partitions
data extraction, cleansing, transformation rules,
defaults
data refresh and purge rules
user profiles, user groups
security: user authorization, access control
2/25/2008
© Jaideep Srivastava
50
Metadata Repository
Business data
business terms and definitions
ownership of data
charging policies
Operational metadata
data lineage: history of migrated data and
sequence of transformations applied
currency of data: active, archived, purged
Monitoring information: warehouse usage
statistics, error reports, audit trails
2/25/2008
© Jaideep Srivastava
51
Data Marts
A data mart (departmental data warehouse) is a
specialized system that brings together the data
needed for a department or related applications.
Data marts can be implemented within the data
warehouse by creating special, application-specific
views.
Data marts can also be implemented as
materialized views departmental subsets that focus
on selected subjects.
Data marts may use different data representations
and include their own OLAP engines
2/25/2008
© Jaideep Srivastava
52
Other Tools
User interface that allows users to
interact with the warehouse
query and reporting tools
analysis tools
data mining tools
2/25/2008
© Jaideep Srivastava
53
Dimensional Data Modeling
2/25/2008
54
Conceptual Modeling of Data
Warehouses
Modeling data warehouses: dimensions &
measurements
Star schema: A single object (fact table) in the middle
connected to a number of objects (dimension tables)
Snowflake schema: A refinement of star schema where
the dimensional hierarchy is represented explicitly by
normalizing the dimension tables.
Fact constellations: Multiple fact tables share dimension
tables.
2/25/2008
© Jaideep Srivastava
55
Example of Star Schema
Date
Product
Date
Month
Year
ProductNo
ProdName
ProdDesc
Category
QOH
Sales Fact Table
Date
Product
Store
StoreID
City
State
Country
Region
Store
Cust
Customer
unit_sales
dollar_sales
CustId
CustName
CustCity
CustCountry
Yen_sales
Measurements
2/25/2008
© Jaideep Srivastava
56
Example of Snowflake Schema
Year
Year
Product
Month
Month
Year
Date
Sales Fact Table
Date
Month
Date
Product
Store
City
City
State
State
Country
StoreID
City
State
Country
Country
Region
2/25/2008
ProductNo
ProdName
ProdDesc
Category
QOH
Store
Cust
Customer
unit_sales
dollar_sales
CustId
CustName
CustCity
CustCountry
Yen_sales
Measurements
© Jaideep Srivastava
57
A Query Model
Customer Orders
Shipping Method
Customer
CONTRACTS
AIR-EXPRESS
ORDER
TRUCK
PRODUCT LINE
Time
Product
ANNUALY QTRLY
DAILY
PRODUCT ITEM PRODUCT GROUP
DISTRICT
SALES PERSON
REGION
DISTRICT
COUNTRY
DIVISION
Geography
2/25/2008
Promotion
© Jaideep Srivastava
Organization
58
Summary Tables
Data warehouse may store some selected
summary data, the pre-aggregated data.
Summary data can store as separate fact tables
sharing the same dimension tables with the base
fact table.
Summary data can be encoded in the original fact
table and dimension tables.
id level date month
0
1
1
1
1
2 NULL
1
2
2 NULL
2
3
3 NULL NULL
2/25/2008
year
1998
1998
1998
1998
© Jaideep Srivastava
DateID ProdID Sales
0
1
1000
1
1
20000
1
2
40000
3
1
300000
59
Multidimensional Data
Sales volume as a function of product, time, and
geography
Product
Re
gi
on
Dimensions: Product, Region, week
Hierarchical summarization paths
Industry Country
Year
Category Region
Quarter
Product
City
Office
Month Week
Day
month
2/25/2008
© Jaideep Srivastava
60
TV
PC
VCR
sum
1Qtr
2Qtr
Date
3Qtr
4Qtr
sum
Total annual sales
of TV in China.
China
India
Japan
Country
Pr
od
uc
t
A Sample Data Cube
sum
2/25/2008
© Jaideep Srivastava
61
On-Line Analytical Processing
(OLAP)
2/25/2008
62
Sample Operations
Roll up: summarize data
total sales volume last year by product category
by region
Roll down, drill down, drill through: go from
higher level summary to lower level summary
or detailed data
For a particular product category, find the detailed
sales data for each salesperson by date
Slice and dice: select and project
Sales of beverages in the West over the last 6
months
Pivot: reorient cube
2/25/2008
© Jaideep Srivastava
63
Cube Operation
SELECT date, product, customer, SUM (amount)
FROM SALES
CUBE BY date, product, customer
Need compute the following Group-Bys
(date, product, customer),
(date,product),(date, customer), (product, customer),
(date), (product) (customer)
2/25/2008
© Jaideep Srivastava
64
Cuboid Lattice
Data cube can
be viewed as a
lattice of
cuboids
R
(A,B,C,D)
(A,B,C) (A,B,D) (A,C,D) (B,C,D)
The bottommost cuboid is (A,B) (A,C) (A,D) (B,C) (B,D) (C,D)
the base cube.
The top most
cuboid
contains only
one cell.
2/25/2008
(A)
(B)
(C)
(D)
( all )
© Jaideep Srivastava
65
Cube Computation -- Array Based
Algorithm
An MOLAP approach: the base cuboid is stored as a
multidimensional array
Read in a number of cells to compute partial cuboids
B
A
C
{ABC}
{AB}
{AC}
{BC}
{A}
{B}
{C}
{}
{}
2/25/2008
© Jaideep Srivastava
66
ROLAP versus MOLAP
ROLAP
exploits services of relational engine effectively
provides additional OLAP services
design tools for DSS schema
performance analysis tool to pick aggregates to
materialize
SQL comes in the way of sequential processing
and columnar aggregation
Some queries are hard to formulate and can often
be time consuming to execute
2/25/2008
© Jaideep Srivastava
67
ROLAP versus MOLAP
MOLAP
the storage model is an n-dimensional array
Front-end multidimensional queries map to server
capabilities in a straightforward way
Direct addressing abilities
Handling sparse data in array representation is
expensive
Poor storage utilization when the data is sparse
2/25/2008
© Jaideep Srivastava
68
Data Mining
2/25/2008
69
What Is Data Mining?
Data mining (knowledge discovery in databases):
Extraction of interesting ( non-trivial, implicit, previously
unknown and potentially useful) information from data in large
databases
Alternative names and their “inside stories”:
Data mining: a misnomer?
Knowledge discovery in databases (KDD: SIGKDD), knowledge
extraction, data archeology, data dredging, information
harvesting, business intelligence, etc.
What is not data mining?
(Deductive) query processing.
Expert systems or small ML/statistical programs
© Jaideep Srivastava
Examples of Interesting
Knowledge
Association rules
98% of people who purchase diapers also buy
beer
Classification
People with age less than 25 and salary > 40k
drive sports cars
Similar time sequences
Stocks of companies A and B perform similarly
Outlier Detection
Residential customers for telecom company with
businesses at home
2/25/2008
© Jaideep Srivastava
71
Motivation: “Necessity is the Mother of
Invention”
Data explosion problem:
Automated data collection tools and mature database technology
lead to tremendous amounts of data stored in databases, data
warehouses and other information repositories.
We are drowning in data, but starving for knowledge!
Data warehousing and data mining :
On-line analytical processing
Extraction of interesting knowledge (rules, regularities, patterns,
constraints) from data in large databases.
© Jaideep Srivastava
Data Mining and Business Intelligence
Increasing potential
to support
business decisions
Making
Decisions
Data Presentation
Visualization Techniques
Data Mining
Information Discovery
End User
Business
Analyst
Data
Analyst
Data Exploration
Statistical Analysis, Querying and Reporting
Data Warehouses / Data Marts
OLAP, MDA
Data Sources
Paper, Files, Information Providers, Database Systems, OLTP
© Jaideep Srivastava
DBA
Data Mining: Confluence of Multiple
Disciplines
Database systems, data warehouse and OLAP
Statistics
Machine learning
Visualization
Information science
High performance computing
Other disciplines:
Neural networks, mathematical modeling, information
retrieval, pattern recognition, etc.
© Jaideep Srivastava
The Data Mining Process
2/25/2008
75
Data Mining: A KDD Process
Data mining: the core of
knowledge discovery process.
Pattern Evaluation
Data Mining
Task-relevant Data
Data Warehouse
Selection
Data Cleaning
Data Integration
Databases
© Jaideep Srivastava
Steps of a KDD Process
Learning the application domain:
relevant prior knowledge and goals of application
Creating a target data set: data selection
Data cleaning and preprocessing: (may take 60% of effort!)
Data reduction and projection:
Find useful features, dimensionality/variable reduction, invariant
representation.
Choosing functions of data mining
summarization, classification, regression, association, clustering.
Choosing the mining algorithm(s)
Data mining: search for patterns of interest
Interpretation: analysis of results.
visualization, transformation, removing redundant patterns, etc.
Use of discovered knowledge.:
© Jaideep Srivastava
Data Mining – Some Issues to
Consider
2/25/2008
78
Three Schemes in Classification
Knowledge to be mined:
Summarization (characterization), comparison,
association, classification, clustering, trend, deviation
and pattern analysis, etc.
Mining knowledge at different abstraction levels:
primitive level, high level, multiple-level, etc.
Databases to be mined:
Relational, transactional, object-oriented, objectrelational, active, spatial, time-series, text, multimedia, heterogeneous, legacy, etc.
Techniques adopted:
Database-oriented, data warehouse (OLAP), machine
learning, statistics, visualization, neural network, etc.
© Jaideep Srivastava
Data Mining: Classification Schemes
General functionality:
Descriptive data mining
Predictive data mining
Different views, different classifications:
Kinds of knowledge to be discovered,
Kinds of databases to be mined, and
Kinds of techniques adopted.
© Jaideep Srivastava
Data Mining Functionality
Concept description: Characterization and Comparison:
Generalize, summarize, and possibly contrast data
characteristics, e.g., dry vs. wet regions.
Association:
From association, correlation, to causality.
finding rules like “inside(x, city)
near(x, highway)”.
Classification and Prediction:
Classify data based on the values in a classifying attribute, e.g.,
classify countries based on climate, or classify cars based on
gas mileage.
Predict some unknown or missing attribute values based on
other information.
© Jaideep Srivastava
Data Mining Functionality
(Cont.)
Clustering:
Group data to form new classes, e.g., cluster houses to find
distribution patterns.
Time-series analysis:
Trend and deviation analysis: Find and characterize evolution
trend, sequential patterns, similar sequences, and deviation
data, e.g., stock analysis.
Similarity-based pattern-directed analysis: Find and
characterize user-specified patterns in large databases.
Cyclicity/periodicity analysis: Find segment-wise or total cycles
or periodic behaviours in time-related data.
Other pattern-directed or statistical analysis:
© Jaideep Srivastava
Data Mining: On What Kind of
Data?
Relational databases
Data warehouses
Transactional databases
Advanced DB systems and information repositories
Object-oriented and object-relational databases
Spatial databases
Time-series data and temporal data
Text databases and multimedia databases
Heterogeneous and legacy databases
WWW
© Jaideep Srivastava
Are All the “Discovered” Patterns
Interesting?
A data mining system/query may generate thousands of
patterns, not all of them are interesting.
Suggested approach: Query-based, focused mining
Interestingness measures: A pattern is interesting if it is
easily understood by humans
valid on new or test data with some degree of certainty.
potentially useful
novel, or validates some hypothesis that a user seeks to confirm
Objective vs. subjective interestingness measures:
Objective: based on statistics and structures of patterns, e.g.,
support, confidence, etc.
Subjective: based on user’s beliefs in the data, e.g.,
unexpectedness, novelty, etc.
© Jaideep Srivastava
Can It Find All and Only Interesting
Patterns?
Find all the interesting patterns: Completeness.
Can a data mining system find all the interesting
patterns?
Search for only interesting patterns: Optimization.
Can a data mining system find only the interesting
patterns?
Approaches
First general all the patterns and then filter out the
uninteresting ones.
Generate only the interesting patterns --- mining
query optimization
© Jaideep Srivastava
Requirements and Challenges in Data
Mining
Mining methodology issues
Mining different kinds of knowledge in databases.
Interactive mining of knowledge at multiple levels of
abstraction.
Incorporation of background knowledge
Data mining query languages and ad-hoc data mining.
Expression and visualization of data mining results.
Handling noise and incomplete data
Pattern evaluation: the interestingness problem.
Performance issues:
Efficiency and scalability of data mining algorithms.
Parallel, distributed and incremental mining methods.
© Jaideep Srivastava
Requirements/Challenges in Data Mining
(Cont.)
Issues relating to the variety of data types:
Handling relational and complex types of data
Mining information from heterogeneous databases and
global information systems.
Issues related to applications and social impacts:
Application of discovered knowledge.
Domain-specific data mining tools
Intelligent query answering
Process control and decision making.
Integration of the discovered knowledge with existing
knowledge: A knowledge fusion problem.
Protection of data security and integrity.
© Jaideep Srivastava
Amazon.com: Case study in
building customer loyalty
2/25/2008
88
The continuing relationship …
Amazon.com “Loyalty” model
Need Creation
anticipate/stimulate
Information search
provide /assist
Evaluate alternatives
assist / negate
Purchase transaction
optimise /reward
Post purchase experience
2/25/2008
© Jaideep Srivastava
add value
89
Need Creation
(attract to website)
Need Creation
2/25/2008
anticipate/stimulate
© Jaideep Srivastava
90
Further Need Creation
(upon reaching website)
2/25/2008
© Jaideep Srivastava
91
Information Search
Information search
2/25/2008
provide /assist
© Jaideep Srivastava
92
Evaluation of Alternatives
Evaluate alternatives
2/25/2008
assist / negate
© Jaideep Srivastava
93
Purchase Optimisation/Reward
Purchase transaction
optimise /reward
•1-click purchase
•‘slippery check out counter’ vs. ‘sticky aisles’
2/25/2008
© Jaideep Srivastava
94
Post-purchase experience
Post purchase experience
2/25/2008
© Jaideep Srivastava
add value
95
Account Management
2/25/2008
© Jaideep Srivastava
96
Why is loyalty important
Amazon’s ‘customer lifetime value’ model (for
book buyers
Average $50 for first time purchase
Average $40 per visit thereafter
Average of one visit per 2 months
Assume customer will be active for 10 years – not
validated yet ☺
‘4 buys and you are hooked’ empirical law
Use Alexa data to bring back ‘prodigal sons’
(and daughters)
2/25/2008
© Jaideep Srivastava
97
Build more loyalty faster
“Loyalty”
LTV
Time
2/25/2008
© Jaideep Srivastava
98
The ‘Virtuous Cycle’
Purchase
response
Buying
decision/process
Customer
knowledge
2/25/2008
© Jaideep Srivastava
99
Internet Marketing Insight –
Jeff Bezos
Role of
Advertisement – get customer to the store
Customer experience – get customer to buy
Brick & mortar stores
Getting customer to store is the hard part
Shopping cart abandonment is not common, since the
overhead of going to another store is very high – especially
in Minnesota winters!
Marketing expenses
80% for advertisement; 20% for customer experience
The 80-20 rule is reversed for on-line stores
– Jeff Bezos
2/25/2008
© Jaideep Srivastava
100
Remarks on Amazon.com
A very innovative company – the poster child
for e-commerce
Is pushing the envelope in personalization
Customers love it
Will it make money – we’re all waiting to see
A company of the future, with a product of
the past, in a market of the present
2/25/2008
© Jaideep Srivastava
101
The Analytics Behind
e-Marketing
2/25/2008
102
Web Logs – Record of consumer
behavior
looney.cs.umn.edu han - [09/Aug/1996:09:53:52 -0500] "GET mobasher/courses/cs5106/cs5106l1.html HTTP/1.0" 200
mega.cs.umn.edu njain - [09/Aug/1996:09:53:52 -0500] "GET / HTTP/1.0" 200 3291
mega.cs.umn.edu njain - [09/Aug/1996:09:53:53 -0500] "GET /images/backgnds/paper.gif HTTP/1.0" 200 3014
mega.cs.umn.edu njain - [09/Aug/1996:09:54:12 -0500] "GET /cgi-bin/Count.cgi?df=CS home.dat\&dd=C\&ft=1 HTTP
mega.cs.umn.edu njain - [09/Aug/1996:09:54:18 -0500] "GET advisor HTTP/1.0" 302
mega.cs.umn.edu njain - [09/Aug/1996:09:54:19 -0500] "GET advisor/ HTTP/1.0" 200 487
looney.cs.umn.edu han - [09/Aug/1996:09:54:28 -0500] "GET mobasher/courses/cs5106/cs5106l2.html HTTP/1.0" 200
...
...
...
Access Log Format
IP address userid time method url protocol status size
mega.cs.umn.edu
njain
09/Aug/1996:09:54:31
advisor/csci-faq.html
Other Server Logs: referrer logs, agent logs
Application server logs: business event logging
2/25/2008
© Jaideep Srivastava
103
Shopping Pipeline Analysis
‘sticky’
states
Browse
catalog
Complete
purchase
Enter
store
Select
items
cross-sell
promotions
•
•
•
•
Overall goal:
•Maximize probability
of reaching final state
•Maximize expected
sales from each visit
‘slippery’
state, i.e.
1-click buy
up-sell
promotions
Shopping pipeline modeled as state transition diagram
Sensitivity analysis of state transition probabilities
Promotion opportunities identified
E-metrics and ROI used to measure effectiveness
2/25/2008
© Jaideep Srivastava
104
Original Amazon Model for
Customer Segmentation
1500
dollars
spent in
1000
past
quarter
H
M
500
1
2
3
4
5
6
7
number of purchases in past quarter
Light buyers
Medium buyers
Heavy buyers
Customer M - medium
Customer H - heavy
Super heavy buyers
2/25/2008
© Jaideep Srivastava
105
Data Driven Customer
Segmentation Model
frequency
monetary
recency
tenure
• modeled customers in a 4-dim space
• used PCA to determine relative weights
of each dimension
• Composite Score = w1*recency + w2*frequency +
w3*monetary + w4*tenure
2/25/2008
© Jaideep Srivastava
106
Customer Score Interpretation
Recency
Cust M
Cust H
2/25/2008
Frequency
Monetary
Tenure
Composite
Score
…
…
…
…
…
10 days
4 times
$480
3 months
80%
…
…
…
…
…
30 days
2 times
$900
10
months
72%
…
…
…
…
…
…
…
…
…
…
• Cust M => frequent visitor but low spender
=> potential for acquiring higher wallet share
=> focus on improving relationship
• Cust H => infrequent visitor but heavy spender
=> focus on
sustaining relationship
© Jaideep Srivastava
107
Yodlee.com: Case study in
web business intelligence
2/25/2008
108
Current Situation: Consumer
Confusion
“It takes me two hours to
get to all my accounts”
“I can’t look at my assets
across accounts”
“I can’t remember all my
user IDs and passwords”
“I want the web to
work for me, not the
other way around”
“This is overwhelming……I
need some help”
2/25/2008
“Make it easier for me!”
© Jaideep Srivastava
109
Solution –
Personal
Information
Aggregation
2/25/2008
© Jaideep Srivastava
110
Aggregation Service Model
Communication Site
(content partner)
Finance
Site
Travel
Site
Capabilities
Content
Acquisition
Aggregation,
Analysis,
Personalization
Aggregation
Service
Provider
AOL
AOLfinance
MyCiti
Mobile
User
Connected
User
2/25/2008
Citibank
© Jaideep Srivastava
Applications
Presentation &
Interaction
111
Business Intelligence Benefits to
Corporation
‘Tip-of-the-iceberg’ analysis for a
brokerage house
Lifestyle preference analysis of banking
customers for a survey
‘True-wallet-share’ analysis for a credit
card organization
Dynamic targeting for banner
advertisements, e-mail campaigns, etc.
2/25/2008
© Jaideep Srivastava
112
‘Tip-of-the-Iceberg’ Analysis for a
Brokerage House
2/25/2008
Asset Based
Tiers
Number of
Users
< $20K
7579
$20K - $100K
2539
$100K - $500K
1994
$500K - $1M
525
$1M - $5M
547
$5M - $25M
106
> $25M
9
© Jaideep Srivastava
• This brokerage
house treated
customers with
net worth > $1M
as ‘high net worth’
(HNW) customers
with specialized
services
• Almost none of the
customers in the
green region had
> $1M with this
brokerage
113
Household Lifestyle Preference
Analysis for a Survey
Financial Preferences
Lifestyle Preferences
- 53% have at least one online
25% make travel reservations online -fewer than users as a whole
banking account
- 51% have an online credit card
account -- higher than
Yodlee users as a whole
- Expedia is more popular as an online travel site than Travelocity
- 49% have a frequent flier account -higher than users as a whole
- 31% also have an E*Trade
account, and 11% also have a
Schwab account
- Have a preference for FirstUSA
over Citibank, the opposite
preference for users as a whole
-The favorite frequent flier programs
are United, Delta, American, in that
order
- Half as many of co-brand users shop
on Ebay than users as a whole
- The most popular credit card is
American Express
2/25/2008
© Jaideep Srivastava
114
‘True-Wallet-Share’ Analysis for a
Credit Card Organization
Range
< $100
Total Users
462
Discover
4.13
American
Express
-467.40
(152)
Mastercard
0
Visa
-29.76 (87)
Other
-60.29 (272)
Average
Total
-190.74
$100 - $200
232
-12.61
(73)(39)
120.17 (66)
0
89.95
167.10 (156)
149.44
$200 - $500
$500 - $1000
$1000 $2000
$2000 $5000
$5000 $10000
$10000+
643
968
1386
36.97 (107)
75.57 (182)
174.55 (292)
253.77 (207)
571.09 (378)
988.97 (540)
0
0
837.25
263.27 (432)
(1)
957.69
1732
620.80 (354)
1696
1332.48
(452)
2156.30
(1099)
4091.64
(814)
10111.75
(1010)
272.42 (421)
623.36 (593)
1078.01
(866)
2358.22
(1579)
4966.61
(1200)
14649.52
(1341)
342.99
893.47
1471.38
2422
(40)(135)
218.93
597.83 (217)
1018.50
(323)
2087.75
(601)
3976.93
(483)
8934.39
(642)
(1)
3648.40
(3)
1921.16
3297.58
7100.20
22329.56
(9)
Analysis of credit card balance habits of user base
• There are1386 people, each of which carries a total balance between $1000
and $2000 on all credit cards that (s)he owns
• 292 of these 1386 people own discover cards, and carry an average balance
of $174.55
• 540 of these 1386 people own AmEx cards, with an average balance of
$988.97
• 323 of these 1386 people carry one or more Visa, with an average Visa
network balance of $1018.50
2/25/2008
© Jaideep Srivastava
115
Business Implications of
True Wallet Share Analysis
A credit card offeror knows exactly how much money customers
holding its cards spend (every month) on its card vs. that on the
competition’s cards
Offeror can target users falling within various segments for
specific customer acquisition, retention, etc. purposes
Detailed profile and history information of these users can be
used for precision targeting and customer messaging through
various channels including ad serving, e-mail campaigns,
promotions, etc.
If transaction level detail information of these users is analyzed,
it can be determined exactly which credit cards are being used
by aggregation users as a whole for what kind of lifestyle
activity, e.g. travel, entertainment, shopping, groceries, etc; this
can help partner decide which market segments to focus on
2/25/2008
© Jaideep Srivastava
116
Business Implications (contd.)
The analysis above, if carried out at an individual user level detail,
can be used to target individual customers with specific
promotions, etc.
Transaction level detail can be classified into charges to specific
organizations, department stores, airlines, etc. This will identify
the top organizations that aggregation users spend money at,
either on the partner’s card or on a competing network. This
would be useful in determining which organizations to partner
with for customer retention, and acquisition, respectively
All of these analyses if performed periodically, and tracked over
time, can provide valuable insight into the evolving credit balance
distribution and usage behavior at the user population or
individual user level
2/25/2008
© Jaideep Srivastava
117
Targeted Ad Serving
2/25/2008
© Jaideep Srivastava
118
Targeted Ad Serving (contd.)
2/25/2008
© Jaideep Srivastava
119
Privacy Issues
2/25/2008
120
let’s begin with some real
examples …
2/25/2008
121
Problem: Shopping for spouse’s
anniversary – too much clutter
2/25/2008
© Jaideep Srivastava
122
Solution: Focused and relevant
advertisement
2/25/2008
© Jaideep Srivastava
123
Problem: Tired of mistreatment
by financial institutions …
You have tons of money in your investment
portfolio
But you are over-worked and slipped a couple
of credit card payment deadlines – after all
you are busy managing your investment
portfolio ☺
Credit card institution treats you like a
deadbeat
2/25/2008
© Jaideep Srivastava
124
Solution
Why not let the credit card institution know
what your investment portfolio balance is?
Impress them ☺
Perhaps even authorize credit card company
to transfer funds from your investment
account to cover the payment? Or maybe not
☺
2/25/2008
© Jaideep Srivastava
125
So, what’s the catch…
Shopping example
Allow the vendor to collect detailed information about you
and build an accurate profile
Junk mail is only a nuisance for the receiver, but an expense
for the sender! – the sender wants to avoid it more than the
receiver!!
Credit card example
Allow the credit card company and investment company to
share your information
Multiple online accounts example
Hand over your account names and passwords to
aggregation service
Sounds scary – but over 1.5 million people have done this in
about 18 months’ time!!
2/25/2008
© Jaideep Srivastava
126
let’s now talk about privacy …
Merriam Webster definition
a: the quality or state of being apart from
company or observation b : freedom from
unauthorized intrusion
Justice Oliver Wendell Holmes
“the right to be left alone”
Operational definition
Collection and analysis of personal data beyond
some limit
2/25/2008
© Jaideep Srivastava
127
Public Attitude Towards Privacy
A (self-professed) non
scientific study carried
out by a USA Today
reporter
Asked 10 people the
following two questions
Are you concerned about
privacy? 8 said YES
If I buy you a Big Mac,
can I keep the wrapper
(to get fingerprints)? 8
said YES
2/25/2008
ACM E-Commerce 2001
paper [Spiekermann et al]
Most people willing to
answer fairly personal
questions to
anthropomorphic web-bot,
even though not relevant to
the task at hand
Different privacy policies had
no impact on behavior
Study carried out in Europe,
where privacy consciousness
is (presumably) higher
© Jaideep Srivastava
128
Public Attitude (contd.)
Amazon.com (and practically
every commercial site) uses
cookies to identify and track
visitors
97.6% of Amazon.com
customers accepted cookies
Airline frequent flier
programs with cross
promotions
We willingly agree to be
tracked
Get upset if the tracking
fails!
Over 1.5 million people have
trusted the aggregation
service (called Yodlee) with
the names and passwords of
their financial accounts in
less than 18 months
Adoption rate has been over
3 times the most optimistic
projections
Medical data is (perhaps) an exception to this
2/25/2008
© Jaideep Srivastava
129
What people really want
Some people will not share any kind of
private data at any cost – the ‘paranoids’
Some people will share any data for returns –
the ‘Jerry Springerites’
The vast majority in the middle wants
a reasonable level of comfort that private data
about them will NOT be misused
Tangible and compelling benefits in return for
sharing their private data – Big Mac example,
frequent flier programs
2/25/2008
© Jaideep Srivastava
130
Remarks on Privacy
Is it ‘much ado about nothing’?
If indeed data collection was outlawed, and thus
personalization impossible, wouldn’t the public lose – faced
with generic, undifferentiated products/services?
Given the public’s attitude about privacy (as shown in their
actions), are privacy advocates barking up the wrong tree?
Is it just a matter of time or generational issue, e.g.
adoption of credit cards
Where do we stand?
Current position - loss of your privacy may be beneficial for
you
Emerging position (post September 11th ) - loss of your
privacy will be beneficial for everyone
Critical emerging debate - is privacy a right or a
privilege?
2/25/2008
© Jaideep Srivastava
131
Concluding Remarks
Internet is a high bandwidth, low latency, negligible
cost, interactive channel to the customer
Very high adoption rates for this channel
Processing speeds and storage capacities continuing
to increase while costs continue to fall
Data analytics technology has grown rapidly
Customer facing applications are ready for a
paradigm shift
Innovative companies have moved ahead
Privacy is an issue, but not much of a concern
2/25/2008
© Jaideep Srivastava
132
Download