Using SAS/Hadoop to Support Marketing Analytics with Big Data Kerem Tomak VP, Marketing Analytics, Macys.com Agenda • • • • • Who is the customer? Life and death of a customer Data galore Crystal Ball What matters the most… Who is the customer? • .com • stores Life of a customer • Present value of all future profits obtained from a customer over his or her life of relationship with a firm. Customer Lifetime Value • The CLV of a customer i is the discounted value of the future profits yielded by this customer • Where – CFi,t = net cash flow generated by the customer i activity at time t – h = time horizon for estimating the CLV – d = discount rate • The CLV is the value added, by an individual customer, to the company Why is CLV important ? • By knowing the CLV of the customers, one can – Focus on groups of customers of equal wealth – Evaluate the budget of a marketing campaign – Measure the efficiency of a past marketing campaign by evaluating the CLV change it incurred • Focus on the most valuable customers, which deserve to be closely followed • Neglect the less valuable ones, to which the company should pay less attention – Use CLV to introduce new segmentation opportunities Tapping into the data • • • • Data Storage Reporting Analytics Advanced Analytics – Computing with big datasets is a fundamentally different challenge than doing “big compute” over a small dataset Utilized data Unutilized data that can be available to business Hadoop & RDBMS Analogy RDBMS & Hadoop is like car & train RDBMS Sports car: • • • • • refined has a lot of features accelerates very fast pricey expensive to maintain Hadoop Cargo train: • • • • • rough missing a lot of “luxury” slow to accelerate carries almost anything moves a lot of stuff very efficiently RDBMS & Hadoop Comparison* Traditional RDBMS (Oracle, DB2) Hadoop Maximum Data Capacity Up to 100’s of TBs Up to 10’s of PBs (hundreds times more) Processing Capacity Up to 10’s of TBs Up to 10’s of PBs (thousands times more) Costs High software, license and hardware/storage costs Cost effective: commodity hardware + open source software Transactional Yes No (batch process) Update Patterns Supported Not Supported Yet Schema Complexity Structured (tables only) Structured or Unstructured Processing Freedom SQL MapReduce, SQL (Hive), Streaming, Pig, HBase, etc.. Scalability Non-linear scaling Fully distributed and linearly scalable Reliability Fault-tolerant at high cost, but without Fault-tolerant and self-healing by self-healing by design desing Real Time Response Yes No (HBase required) * Cloudera comparison chart Crystal Ball Source: Forrester 10 Toolshed What matters the most • Building data infrastructure – Fast processing of large amounts of data and deployment of model scoring on the same environment • Business task execution – Real-time optimization for customized offer management • Planning tools – Give analytical guidelines to campaign management • Strategic support – Develop robust analytics that look at customer’s environment “Making sense out of models” “Deploying in production” Questions? Kerem Tomak kerem.tomak@macys.com 4154221408