Using SAS/Hadoop to Support Marketing Analytics with Big Data

advertisement
Using SAS/Hadoop to Support
Marketing Analytics with Big Data
Kerem Tomak
VP, Marketing Analytics, Macys.com
Agenda
•
•
•
•
•
Who is the customer?
Life and death of a customer
Data galore
Crystal Ball
What matters the most…
Who is the customer?
• .com
• stores
Life of a customer
• Present value of all
future profits
obtained from a
customer over his or
her life of relationship
with a firm.
Customer Lifetime Value
• The CLV of a customer i is the discounted value of the future
profits yielded by this customer
• Where
– CFi,t = net cash flow generated by the customer i activity at time t
– h = time horizon for estimating the CLV
– d = discount rate
• The CLV is the value added, by an individual customer, to the
company
Why is CLV important ?
• By knowing the CLV of the customers, one can
– Focus on groups of customers of equal wealth
– Evaluate the budget of a marketing campaign
– Measure the efficiency of a past marketing campaign by
evaluating the CLV change it incurred
• Focus on the most valuable customers, which deserve to be closely
followed
• Neglect the less valuable ones, to which the company should pay less
attention
– Use CLV to introduce new segmentation opportunities
Tapping into the data
•
•
•
•
Data Storage
Reporting
Analytics
Advanced Analytics
– Computing with big
datasets is a
fundamentally different
challenge than doing “big
compute” over a small
dataset
Utilized data
Unutilized data
that can be
available to
business
Hadoop & RDBMS Analogy
RDBMS & Hadoop is like car & train
RDBMS
Sports car:
•
•
•
•
•
refined
has a lot of features
accelerates very fast
pricey
expensive to maintain
Hadoop
Cargo train:
•
•
•
•
•
rough
missing a lot of “luxury”
slow to accelerate
carries almost anything
moves a lot of stuff very
efficiently
RDBMS & Hadoop Comparison*
Traditional RDBMS (Oracle, DB2)
Hadoop
Maximum Data Capacity
Up to 100’s of TBs
Up to 10’s of PBs (hundreds times
more)
Processing Capacity
Up to 10’s of TBs
Up to 10’s of PBs (thousands times
more)
Costs
High software, license and
hardware/storage costs
Cost effective: commodity hardware +
open source software
Transactional
Yes
No (batch process)
Update Patterns
Supported
Not Supported Yet
Schema Complexity
Structured (tables only)
Structured or Unstructured
Processing Freedom
SQL
MapReduce, SQL (Hive), Streaming,
Pig, HBase, etc..
Scalability
Non-linear scaling
Fully distributed and linearly scalable
Reliability
Fault-tolerant at high cost, but without Fault-tolerant and self-healing by
self-healing by design
desing
Real Time Response
Yes
No (HBase required)
* Cloudera comparison chart
Crystal Ball
Source: Forrester
10
Toolshed
What matters the most
• Building data infrastructure
– Fast processing of large amounts of data and deployment of model
scoring on the same environment
• Business task execution
– Real-time optimization for customized offer management
• Planning tools
– Give analytical guidelines to campaign management
• Strategic support
– Develop robust analytics that look at customer’s environment
“Making sense out of models” “Deploying in production”
Questions?
Kerem Tomak
kerem.tomak@macys.com
4154221408
Download