Data Analytics for Customer Facing Applications Jaideep Srivastava Computer Science & Engineering srivasta@cs.umn.edu 2/25/2008 1 Presentation Outline Technology trends Customer facing applications Status of CRM efforts Analytical CRM Customer segmentation Customer loyalty Customer retention Analytical CRM architecture Data warehouse Dimensional data modeling On-line analytical processing (OLAP) 2/25/2008 Data mining Amazon.com: case study in building customer loyalty Analytics behind emarketing Yodlee.com: case study in web business intelligence Privacy issues Conclusion © Jaideep Srivastava 2 Technology Trends Internet growth Faster than any other infrastructure Data collection Rapid drop in storage costs Dramatic improvement in resolution and rate of data collection ‘probes’ Data analytics Increasing deployment of warehouses Major leap forward in data mining technologies and tools Becoming possible to really understand what your customers want – even at the individual level!! 2/25/2008 © Jaideep Srivastava 3 Infrastructure Adoption in the US 120 Millions of users 60 Radio 0 2/25/2008 1922 TV Cable Internet 1950 1980 © Jaideep Srivastava 1995 2000 4 Product Marketing – 75 years ago • Production – a la Adam Smith • You can have any color as long as its black – Ford Motor Co. 2/25/2008 © Jaideep Srivastava 5 Product Marketing - today Add the spice of flexibility, courtesy of robotics, computers … 5 2/25/2008 © Jaideep Srivastava 6 New approach to marketing TO: Finding products that are right for each customer TURN the process through 90 degrees FROM: Finding customers that are right for each product To achieve this we need to align around Products: 1 2 3 4 5 ….. •Organization and culture •Business processes and skill •Measurement and incentives •Information management •Technology 2/25/2008 © Jaideep Srivastava 7 “Mass Customization” – Mass production Cheap to produce Efficient to produce Uniform features/quality ‘one size fits all’ approach Optimize production cost B. Joseph Pine Customization Expensive to produce Inefficient to produce Customized features ‘tailor made’ approach Optimize customer satisfaction Mass customization Cheap & efficient to produce Customized features ‘tailor made’ approach Optimize production cost & customer satisfaction 2/25/2008 © Jaideep Srivastava 8 Customer Facing Applications 2/25/2008 9 Customer Facing Applications Consumer marketing Campaign management Opportunity management Web-based encyclopedia, configurator Market segmentation Lead generation/enhancement/tracking 2/25/2008 © Jaideep Srivastava 10 Customer Facing Applications Customer care & support Incident assignment/escalation/tracking/reporting Problem management/resolution Order management/promise fulfillment Warranty/contract management Field service support Work orders, dispatching Real time information transfer to field personnel via mobile technologies 2/25/2008 © Jaideep Srivastava 11 Customer Facing Applications Corporate sales Contact management profiles and history Account management including activities Order entry Proposal generation Sales management Pipeline analysis, e.g. forecasting Sales cycle analysis Territory alignment Roll-up and drill-down reporting 2/25/2008 © Jaideep Srivastava 12 Status of Customer Relationship Management (CRM) Efforts 2/25/2008 13 Companies are spending megabudgets on CRM CRM = software + support services European CRM expenditure = $1.2B + $3.0B = $4.2B* UK marketing service industry growing at 17.4% to $7.7B CRM Relationship marketing Customer service Value added programs Loyalty programs Culture change *Hewson Consulting October 2000 2/25/2008 © Jaideep Srivastava 14 But - satisfaction is declining 2/25/2008 © Jaideep Srivastava 15 And - more customers are complaining 2/25/2008 © Jaideep Srivastava 16 Increasing customer resistance 98% of customer solicitations are irrelevant 82% of individuals would like to block all marketing access to their own data Campaign hit rates and customer loyalty indicators are declining 2/25/2008 © Jaideep Srivastava 17 Consequently The ‘best’ customers are being over communicated to Today’s less valuable customers are not being developed into tomorrow’s ‘best’ customers The business potential of the customer base is not being maximized 2/25/2008 © Jaideep Srivastava 18 Solution: Analytical CRM CRM = Customer Understanding + Relationship Management Analytics helps in Customer Understanding Analytics = OLAP, Statistical analysis, data mining, etc. 2/25/2008 © Jaideep Srivastava 19 Example Customer Facing Applications Helped by Analytical CRM Customer segmentation Customer loyalty building Customer retention/recovery 2/25/2008 © Jaideep Srivastava 20 Customer segmentation Purpose of segmentation is to identify groups of customers with similar needs and behavior patterns, so that they be offered more tightly focused Products Services Communications Segments should be Identifiable Quantifiable Addressable Of sufficient size to be worth addressing Two approaches to segmentation cluster common characteristics, and then map out behavior patterns Separate out behavior patterns, then identify segment characteristics 2/25/2008 © Jaideep Srivastava 21 Customer base segmentation Potential business High Low Develop Retain Observe & Incentivize Low Care & Maintenance Actual business High Targeted communication to each segment 2/25/2008 © Jaideep Srivastava 22 Segmentation by value 2/25/2008 © Jaideep Srivastava 23 Express profits as deciles, and ask questions 1200 1000 800 600 Profit 400 200 0 -200 -400 -600 -800 -1000 -1200 2/25/2008 Should the focus be on retaining wallet share from segments 8 – 10? Or, on gaining from segments 1 – 4? Who are these customers; what do they look like? Deciles Are these worth keeping? Can we service them with a lower cost channel? What can we do to make this segment profitable? © Jaideep Srivastava Middle 60%, either side of break even. What can we do about these? 24 Customer loyalty: close relationships are more profitable 2/25/2008 © Jaideep Srivastava 25 Relationship intensity and defection odds Evidence suggests that customer ‘lock in’ occurs once 4 or more products are purchased 98.3% 18.1% Odds of not defecting 10.2% 1.1% 1 2 3 4 Number of products purchased 2/25/2008 © Jaideep Srivastava 26 A difference of opinion … Company view Customer view 90% 70% 32% 2% Customers are happy with our customer service 2/25/2008 We research customer service needs and wants as part of our customer service improvement © Jaideep Srivastava Customer service needs no improvement Customer service today is better than ever 27 … and action Company view Customer view 98% 43% 7% We want to develop a relationship with our customers 2/25/2008 We want to form and develop a relationship with our suppliers © Jaideep Srivastava The relationship now is stronger than 12 months ago 28 Increasing propensity to buy over a customer life cycle Actions which build relationship warmth •No-fault service •“Have a nice day” •Targeted sales Customer relationship profitability 2/25/2008 © Jaideep Srivastava 29 Loyalty is built through a virtuous circle of new customer experience Virtuous circle of customer experience Provides legitimacy to offer advice Superlative Customer service Individualized and helpful dialog Innovative new products 2/25/2008 Provides legitimacy to offer advice Excites the customer and builds loyalty © Jaideep Srivastava 30 Lifetime Impact of Customer Loyalty “Maximized” customer value Customer potential V A L U E “Realized” customer value TIME 2/25/2008 © Jaideep Srivastava 31 Managing Credit-Card Retention in the Pacific Rim •Behavioral Propensity Model based Campaigns generate New Customers •Selective score-based phone follow-up more than doubles response •“Event-driven”(Trans. Vol. & Value) Campaigns to stimulate initial usage of credit-card. •Propensity model + “Event-driven” Customer Retention program identifies likely non-renewers 3 months prior to renewal, and kicks in usage stimulation program •Different offers (“Frequent User Club” versus Premium) being tested Impact: Over 100% improvement in both Acquisition and Retention. New market opened up. 2/25/2008 © Jaideep Srivastava 32 Using Negative Events to drive Positive Sales Event = “ATM request for cash” is rejected due to lack of funds. For credit-worthy customers, unsecured personal loan is offered by mail or phone the next day! 30% acceptance rate of product offered. Impact: Significant cross-sales of additional product Significant reduction in negative reactions 2/25/2008 © Jaideep Srivastava 33 Analytical CRM Architecture 2/25/2008 34 Analytical CRM Loop Hypothesis generation Results Analysis Action 2/25/2008 © Jaideep Srivastava 35 Traditional Growth of Functions in an Organization THE PRESENT MULTIPLE CHANNELS & DATA STORES / IMPERSONAL SERVICE 3rd Party Resellers Kiosk ATM Branch Data Impact! • IMPERSONAL • LOW QUALITY • UNINFORMED • INCONSISTENT Outbound Call Centre Data Data Data WEB Fax Email WAP Inbound Call Centre l In Confidence 2/25/2008 © Jaideep Srivastava 36 Vision for Customer Driven CRM THE NEAR FUTURE MULTIPLE CHANNELS & DATA STORES / PERSONALISED SERVICE Impact! 2/25/2008 • PERSONALISED • HIGH QUALITY • INFORMED • CONSISTENT DATA © Jaideep Srivastava 37 Canonical Analytics Architecture metadata Monitor & Integrator OLAP Server other sources Operational DBs Extract Transform Load Refresh Serve Data Warehouse Analysis Query Reports Data mining Tools Data2/25/2008 Sources Data Marts © Jaideep Srivastava 38 Data Warehouse 2/25/2008 39 Data Warehouse A decision support database that is maintained separately from the organization’s operational database A data warehouse is a subject-oriented, integrated, time-varying, non-volatile collection of data that is used primarily in organizational decision making. 2/25/2008 © Jaideep Srivastava 40 Data Warehouse - Subject Oriented subject oriented: oriented to the major subject areas of the corporation that have been defined in the data model. E.g. for an insurance company: customer, product, transaction or activity, policy, claim, account, and etc. operational DB and applications may be organized differently E.g. based on type of insurance's: auto, life, medical, fire, ... 2/25/2008 © Jaideep Srivastava 41 Data Warehouse - Integrated There is no consistency in encoding, naming conventions, … among different data sources When data is moved to the warehouse, it is converted. 2/25/2008 © Jaideep Srivastava 42 Data Warehouse - Non-Volatile Operational data is regularly accessed and manipulated a record at a time and update is done to data in the operational environment. Warehouse Data is loaded and accessed. Update of data does not occur in the data warehouse environment. 2/25/2008 © Jaideep Srivastava 43 Data Warehouse - Time Variance The time horizon for the data warehouse is significantly longer than that of operational systems. Operational database contain current value data. Data warehouse data is nothing more than a sophisticated series of snapshots, taken as of some moment in time. The key structure of operational data may or may not contain some element if time. The key structure of the data warehouse always contains some element of time. 2/25/2008 © Jaideep Srivastava 44 Data Sources Data sources are often the operational systems, providing the lowest level of data. Data sources are designed for operational use, not for decision support, and the data reflect this fact. Multiple data sources are often from different systems run on a wide range of hardware and much of the software is built in-house or highly customized. Multiple data sources introduce a large number of issues - semantic conflicts. 2/25/2008 © Jaideep Srivastava 45 Data Cleaning Important to warehouse clean data (operational data from multiple sources are often dirty). Three classes of tools Data migration: allows simple data transformation Data Scrubbing: uses domain-specific knowledge to scrub data Data auditing: discovers rules and relationships by scanning data (detect outliers). 2/25/2008 © Jaideep Srivastava 46 Load and Refresh Loading the warehouse includes some other processing tasks: checking integrity constraints, sorting, summarizing, build indxes, etc. Refreshing a warehouse means propagating updates on source data to the data stored in the warehouse when to refresh determined by usage, types of data source, etc. how to refresh data shipping: using triggers to update snapshot log table and propagate the updated data to the warehouse transaction shipping: shipping the updates in the transaction log 2/25/2008 © Jaideep Srivastava 47 Monitor detect changes to an information source that are of interest to the warehouse define triggers in a full-functionality DBMS examine the updates in the log file write programs for legacy systems propagate the change in a generic form to the integrator 2/25/2008 © Jaideep Srivastava 48 Integrator receive changes from the monitors make the data conform to the conceptual schema used by the warehouse integrate the changes into the warehouse merge the data with existing data already present resolve possible update anomalies 2/25/2008 © Jaideep Srivastava 49 Metadata Repository Administrative metadata source database and their contents gateway descriptions warehouse schema, view and derived data definitions dimensions and hierarchies pre-defined queries and reports data mart locations and contents data partitions data extraction, cleansing, transformation rules, defaults data refresh and purge rules user profiles, user groups security: user authorization, access control 2/25/2008 © Jaideep Srivastava 50 Metadata Repository Business data business terms and definitions ownership of data charging policies Operational metadata data lineage: history of migrated data and sequence of transformations applied currency of data: active, archived, purged Monitoring information: warehouse usage statistics, error reports, audit trails 2/25/2008 © Jaideep Srivastava 51 Data Marts A data mart (departmental data warehouse) is a specialized system that brings together the data needed for a department or related applications. Data marts can be implemented within the data warehouse by creating special, application-specific views. Data marts can also be implemented as materialized views departmental subsets that focus on selected subjects. Data marts may use different data representations and include their own OLAP engines 2/25/2008 © Jaideep Srivastava 52 Other Tools User interface that allows users to interact with the warehouse query and reporting tools analysis tools data mining tools 2/25/2008 © Jaideep Srivastava 53 Dimensional Data Modeling 2/25/2008 54 Conceptual Modeling of Data Warehouses Modeling data warehouses: dimensions & measurements Star schema: A single object (fact table) in the middle connected to a number of objects (dimension tables) Snowflake schema: A refinement of star schema where the dimensional hierarchy is represented explicitly by normalizing the dimension tables. Fact constellations: Multiple fact tables share dimension tables. 2/25/2008 © Jaideep Srivastava 55 Example of Star Schema Date Product Date Month Year ProductNo ProdName ProdDesc Category QOH Sales Fact Table Date Product Store StoreID City State Country Region Store Cust Customer unit_sales dollar_sales CustId CustName CustCity CustCountry Yen_sales Measurements 2/25/2008 © Jaideep Srivastava 56 Example of Snowflake Schema Year Year Product Month Month Year Date Sales Fact Table Date Month Date Product Store City City State State Country StoreID City State Country Country Region 2/25/2008 ProductNo ProdName ProdDesc Category QOH Store Cust Customer unit_sales dollar_sales CustId CustName CustCity CustCountry Yen_sales Measurements © Jaideep Srivastava 57 A Query Model Customer Orders Shipping Method Customer CONTRACTS AIR-EXPRESS ORDER TRUCK PRODUCT LINE Time Product ANNUALY QTRLY DAILY PRODUCT ITEM PRODUCT GROUP DISTRICT SALES PERSON REGION DISTRICT COUNTRY DIVISION Geography 2/25/2008 Promotion © Jaideep Srivastava Organization 58 Summary Tables Data warehouse may store some selected summary data, the pre-aggregated data. Summary data can store as separate fact tables sharing the same dimension tables with the base fact table. Summary data can be encoded in the original fact table and dimension tables. id level date month 0 1 1 1 1 2 NULL 1 2 2 NULL 2 3 3 NULL NULL 2/25/2008 year 1998 1998 1998 1998 © Jaideep Srivastava DateID ProdID Sales 0 1 1000 1 1 20000 1 2 40000 3 1 300000 59 Multidimensional Data Sales volume as a function of product, time, and geography Product Re gi on Dimensions: Product, Region, week Hierarchical summarization paths Industry Country Year Category Region Quarter Product City Office Month Week Day month 2/25/2008 © Jaideep Srivastava 60 TV PC VCR sum 1Qtr 2Qtr Date 3Qtr 4Qtr sum Total annual sales of TV in China. China India Japan Country Pr od uc t A Sample Data Cube sum 2/25/2008 © Jaideep Srivastava 61 On-Line Analytical Processing (OLAP) 2/25/2008 62 Sample Operations Roll up: summarize data total sales volume last year by product category by region Roll down, drill down, drill through: go from higher level summary to lower level summary or detailed data For a particular product category, find the detailed sales data for each salesperson by date Slice and dice: select and project Sales of beverages in the West over the last 6 months Pivot: reorient cube 2/25/2008 © Jaideep Srivastava 63 Cube Operation SELECT date, product, customer, SUM (amount) FROM SALES CUBE BY date, product, customer Need compute the following Group-Bys (date, product, customer), (date,product),(date, customer), (product, customer), (date), (product) (customer) 2/25/2008 © Jaideep Srivastava 64 Cuboid Lattice Data cube can be viewed as a lattice of cuboids R (A,B,C,D) (A,B,C) (A,B,D) (A,C,D) (B,C,D) The bottommost cuboid is (A,B) (A,C) (A,D) (B,C) (B,D) (C,D) the base cube. The top most cuboid contains only one cell. 2/25/2008 (A) (B) (C) (D) ( all ) © Jaideep Srivastava 65 Cube Computation -- Array Based Algorithm An MOLAP approach: the base cuboid is stored as a multidimensional array Read in a number of cells to compute partial cuboids B A C {ABC} {AB} {AC} {BC} {A} {B} {C} {} {} 2/25/2008 © Jaideep Srivastava 66 ROLAP versus MOLAP ROLAP exploits services of relational engine effectively provides additional OLAP services design tools for DSS schema performance analysis tool to pick aggregates to materialize SQL comes in the way of sequential processing and columnar aggregation Some queries are hard to formulate and can often be time consuming to execute 2/25/2008 © Jaideep Srivastava 67 ROLAP versus MOLAP MOLAP the storage model is an n-dimensional array Front-end multidimensional queries map to server capabilities in a straightforward way Direct addressing abilities Handling sparse data in array representation is expensive Poor storage utilization when the data is sparse 2/25/2008 © Jaideep Srivastava 68 Data Mining 2/25/2008 69 What Is Data Mining? Data mining (knowledge discovery in databases): Extraction of interesting ( non-trivial, implicit, previously unknown and potentially useful) information from data in large databases Alternative names and their “inside stories”: Data mining: a misnomer? Knowledge discovery in databases (KDD: SIGKDD), knowledge extraction, data archeology, data dredging, information harvesting, business intelligence, etc. What is not data mining? (Deductive) query processing. Expert systems or small ML/statistical programs © Jaideep Srivastava Examples of Interesting Knowledge Association rules 98% of people who purchase diapers also buy beer Classification People with age less than 25 and salary > 40k drive sports cars Similar time sequences Stocks of companies A and B perform similarly Outlier Detection Residential customers for telecom company with businesses at home 2/25/2008 © Jaideep Srivastava 71 Motivation: “Necessity is the Mother of Invention” Data explosion problem: Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories. We are drowning in data, but starving for knowledge! Data warehousing and data mining : On-line analytical processing Extraction of interesting knowledge (rules, regularities, patterns, constraints) from data in large databases. © Jaideep Srivastava Data Mining and Business Intelligence Increasing potential to support business decisions Making Decisions Data Presentation Visualization Techniques Data Mining Information Discovery End User Business Analyst Data Analyst Data Exploration Statistical Analysis, Querying and Reporting Data Warehouses / Data Marts OLAP, MDA Data Sources Paper, Files, Information Providers, Database Systems, OLTP © Jaideep Srivastava DBA Data Mining: Confluence of Multiple Disciplines Database systems, data warehouse and OLAP Statistics Machine learning Visualization Information science High performance computing Other disciplines: Neural networks, mathematical modeling, information retrieval, pattern recognition, etc. © Jaideep Srivastava The Data Mining Process 2/25/2008 75 Data Mining: A KDD Process Data mining: the core of knowledge discovery process. Pattern Evaluation Data Mining Task-relevant Data Data Warehouse Selection Data Cleaning Data Integration Databases © Jaideep Srivastava Steps of a KDD Process Learning the application domain: relevant prior knowledge and goals of application Creating a target data set: data selection Data cleaning and preprocessing: (may take 60% of effort!) Data reduction and projection: Find useful features, dimensionality/variable reduction, invariant representation. Choosing functions of data mining summarization, classification, regression, association, clustering. Choosing the mining algorithm(s) Data mining: search for patterns of interest Interpretation: analysis of results. visualization, transformation, removing redundant patterns, etc. Use of discovered knowledge.: © Jaideep Srivastava Data Mining – Some Issues to Consider 2/25/2008 78 Three Schemes in Classification Knowledge to be mined: Summarization (characterization), comparison, association, classification, clustering, trend, deviation and pattern analysis, etc. Mining knowledge at different abstraction levels: primitive level, high level, multiple-level, etc. Databases to be mined: Relational, transactional, object-oriented, objectrelational, active, spatial, time-series, text, multimedia, heterogeneous, legacy, etc. Techniques adopted: Database-oriented, data warehouse (OLAP), machine learning, statistics, visualization, neural network, etc. © Jaideep Srivastava Data Mining: Classification Schemes General functionality: Descriptive data mining Predictive data mining Different views, different classifications: Kinds of knowledge to be discovered, Kinds of databases to be mined, and Kinds of techniques adopted. © Jaideep Srivastava Data Mining Functionality Concept description: Characterization and Comparison: Generalize, summarize, and possibly contrast data characteristics, e.g., dry vs. wet regions. Association: From association, correlation, to causality. finding rules like “inside(x, city) near(x, highway)”. Classification and Prediction: Classify data based on the values in a classifying attribute, e.g., classify countries based on climate, or classify cars based on gas mileage. Predict some unknown or missing attribute values based on other information. © Jaideep Srivastava Data Mining Functionality (Cont.) Clustering: Group data to form new classes, e.g., cluster houses to find distribution patterns. Time-series analysis: Trend and deviation analysis: Find and characterize evolution trend, sequential patterns, similar sequences, and deviation data, e.g., stock analysis. Similarity-based pattern-directed analysis: Find and characterize user-specified patterns in large databases. Cyclicity/periodicity analysis: Find segment-wise or total cycles or periodic behaviours in time-related data. Other pattern-directed or statistical analysis: © Jaideep Srivastava Data Mining: On What Kind of Data? Relational databases Data warehouses Transactional databases Advanced DB systems and information repositories Object-oriented and object-relational databases Spatial databases Time-series data and temporal data Text databases and multimedia databases Heterogeneous and legacy databases WWW © Jaideep Srivastava Are All the “Discovered” Patterns Interesting? A data mining system/query may generate thousands of patterns, not all of them are interesting. Suggested approach: Query-based, focused mining Interestingness measures: A pattern is interesting if it is easily understood by humans valid on new or test data with some degree of certainty. potentially useful novel, or validates some hypothesis that a user seeks to confirm Objective vs. subjective interestingness measures: Objective: based on statistics and structures of patterns, e.g., support, confidence, etc. Subjective: based on user’s beliefs in the data, e.g., unexpectedness, novelty, etc. © Jaideep Srivastava Can It Find All and Only Interesting Patterns? Find all the interesting patterns: Completeness. Can a data mining system find all the interesting patterns? Search for only interesting patterns: Optimization. Can a data mining system find only the interesting patterns? Approaches First general all the patterns and then filter out the uninteresting ones. Generate only the interesting patterns --- mining query optimization © Jaideep Srivastava Requirements and Challenges in Data Mining Mining methodology issues Mining different kinds of knowledge in databases. Interactive mining of knowledge at multiple levels of abstraction. Incorporation of background knowledge Data mining query languages and ad-hoc data mining. Expression and visualization of data mining results. Handling noise and incomplete data Pattern evaluation: the interestingness problem. Performance issues: Efficiency and scalability of data mining algorithms. Parallel, distributed and incremental mining methods. © Jaideep Srivastava Requirements/Challenges in Data Mining (Cont.) Issues relating to the variety of data types: Handling relational and complex types of data Mining information from heterogeneous databases and global information systems. Issues related to applications and social impacts: Application of discovered knowledge. Domain-specific data mining tools Intelligent query answering Process control and decision making. Integration of the discovered knowledge with existing knowledge: A knowledge fusion problem. Protection of data security and integrity. © Jaideep Srivastava Amazon.com: Case study in building customer loyalty 2/25/2008 88 The continuing relationship … Amazon.com “Loyalty” model Need Creation anticipate/stimulate Information search provide /assist Evaluate alternatives assist / negate Purchase transaction optimise /reward Post purchase experience 2/25/2008 © Jaideep Srivastava add value 89 Need Creation (attract to website) Need Creation 2/25/2008 anticipate/stimulate © Jaideep Srivastava 90 Further Need Creation (upon reaching website) 2/25/2008 © Jaideep Srivastava 91 Information Search Information search 2/25/2008 provide /assist © Jaideep Srivastava 92 Evaluation of Alternatives Evaluate alternatives 2/25/2008 assist / negate © Jaideep Srivastava 93 Purchase Optimisation/Reward Purchase transaction optimise /reward •1-click purchase •‘slippery check out counter’ vs. ‘sticky aisles’ 2/25/2008 © Jaideep Srivastava 94 Post-purchase experience Post purchase experience 2/25/2008 © Jaideep Srivastava add value 95 Account Management 2/25/2008 © Jaideep Srivastava 96 Why is loyalty important Amazon’s ‘customer lifetime value’ model (for book buyers Average $50 for first time purchase Average $40 per visit thereafter Average of one visit per 2 months Assume customer will be active for 10 years – not validated yet ☺ ‘4 buys and you are hooked’ empirical law Use Alexa data to bring back ‘prodigal sons’ (and daughters) 2/25/2008 © Jaideep Srivastava 97 Build more loyalty faster “Loyalty” LTV Time 2/25/2008 © Jaideep Srivastava 98 The ‘Virtuous Cycle’ Purchase response Buying decision/process Customer knowledge 2/25/2008 © Jaideep Srivastava 99 Internet Marketing Insight – Jeff Bezos Role of Advertisement – get customer to the store Customer experience – get customer to buy Brick & mortar stores Getting customer to store is the hard part Shopping cart abandonment is not common, since the overhead of going to another store is very high – especially in Minnesota winters! Marketing expenses 80% for advertisement; 20% for customer experience The 80-20 rule is reversed for on-line stores – Jeff Bezos 2/25/2008 © Jaideep Srivastava 100 Remarks on Amazon.com A very innovative company – the poster child for e-commerce Is pushing the envelope in personalization Customers love it Will it make money – we’re all waiting to see A company of the future, with a product of the past, in a market of the present 2/25/2008 © Jaideep Srivastava 101 The Analytics Behind e-Marketing 2/25/2008 102 Web Logs – Record of consumer behavior looney.cs.umn.edu han - [09/Aug/1996:09:53:52 -0500] "GET mobasher/courses/cs5106/cs5106l1.html HTTP/1.0" 200 mega.cs.umn.edu njain - [09/Aug/1996:09:53:52 -0500] "GET / HTTP/1.0" 200 3291 mega.cs.umn.edu njain - [09/Aug/1996:09:53:53 -0500] "GET /images/backgnds/paper.gif HTTP/1.0" 200 3014 mega.cs.umn.edu njain - [09/Aug/1996:09:54:12 -0500] "GET /cgi-bin/Count.cgi?df=CS home.dat\&dd=C\&ft=1 HTTP mega.cs.umn.edu njain - [09/Aug/1996:09:54:18 -0500] "GET advisor HTTP/1.0" 302 mega.cs.umn.edu njain - [09/Aug/1996:09:54:19 -0500] "GET advisor/ HTTP/1.0" 200 487 looney.cs.umn.edu han - [09/Aug/1996:09:54:28 -0500] "GET mobasher/courses/cs5106/cs5106l2.html HTTP/1.0" 200 ... ... ... Access Log Format IP address userid time method url protocol status size mega.cs.umn.edu njain 09/Aug/1996:09:54:31 advisor/csci-faq.html Other Server Logs: referrer logs, agent logs Application server logs: business event logging 2/25/2008 © Jaideep Srivastava 103 Shopping Pipeline Analysis ‘sticky’ states Browse catalog Complete purchase Enter store Select items cross-sell promotions • • • • Overall goal: •Maximize probability of reaching final state •Maximize expected sales from each visit ‘slippery’ state, i.e. 1-click buy up-sell promotions Shopping pipeline modeled as state transition diagram Sensitivity analysis of state transition probabilities Promotion opportunities identified E-metrics and ROI used to measure effectiveness 2/25/2008 © Jaideep Srivastava 104 Original Amazon Model for Customer Segmentation 1500 dollars spent in 1000 past quarter H M 500 1 2 3 4 5 6 7 number of purchases in past quarter Light buyers Medium buyers Heavy buyers Customer M - medium Customer H - heavy Super heavy buyers 2/25/2008 © Jaideep Srivastava 105 Data Driven Customer Segmentation Model frequency monetary recency tenure • modeled customers in a 4-dim space • used PCA to determine relative weights of each dimension • Composite Score = w1*recency + w2*frequency + w3*monetary + w4*tenure 2/25/2008 © Jaideep Srivastava 106 Customer Score Interpretation Recency Cust M Cust H 2/25/2008 Frequency Monetary Tenure Composite Score … … … … … 10 days 4 times $480 3 months 80% … … … … … 30 days 2 times $900 10 months 72% … … … … … … … … … … • Cust M => frequent visitor but low spender => potential for acquiring higher wallet share => focus on improving relationship • Cust H => infrequent visitor but heavy spender => focus on sustaining relationship © Jaideep Srivastava 107 Yodlee.com: Case study in web business intelligence 2/25/2008 108 Current Situation: Consumer Confusion “It takes me two hours to get to all my accounts” “I can’t look at my assets across accounts” “I can’t remember all my user IDs and passwords” “I want the web to work for me, not the other way around” “This is overwhelming……I need some help” 2/25/2008 “Make it easier for me!” © Jaideep Srivastava 109 Solution – Personal Information Aggregation 2/25/2008 © Jaideep Srivastava 110 Aggregation Service Model Communication Site (content partner) Finance Site Travel Site Capabilities Content Acquisition Aggregation, Analysis, Personalization Aggregation Service Provider AOL AOLfinance MyCiti Mobile User Connected User 2/25/2008 Citibank © Jaideep Srivastava Applications Presentation & Interaction 111 Business Intelligence Benefits to Corporation ‘Tip-of-the-iceberg’ analysis for a brokerage house Lifestyle preference analysis of banking customers for a survey ‘True-wallet-share’ analysis for a credit card organization Dynamic targeting for banner advertisements, e-mail campaigns, etc. 2/25/2008 © Jaideep Srivastava 112 ‘Tip-of-the-Iceberg’ Analysis for a Brokerage House 2/25/2008 Asset Based Tiers Number of Users < $20K 7579 $20K - $100K 2539 $100K - $500K 1994 $500K - $1M 525 $1M - $5M 547 $5M - $25M 106 > $25M 9 © Jaideep Srivastava • This brokerage house treated customers with net worth > $1M as ‘high net worth’ (HNW) customers with specialized services • Almost none of the customers in the green region had > $1M with this brokerage 113 Household Lifestyle Preference Analysis for a Survey Financial Preferences Lifestyle Preferences - 53% have at least one online 25% make travel reservations online -fewer than users as a whole banking account - 51% have an online credit card account -- higher than Yodlee users as a whole - Expedia is more popular as an online travel site than Travelocity - 49% have a frequent flier account -higher than users as a whole - 31% also have an E*Trade account, and 11% also have a Schwab account - Have a preference for FirstUSA over Citibank, the opposite preference for users as a whole -The favorite frequent flier programs are United, Delta, American, in that order - Half as many of co-brand users shop on Ebay than users as a whole - The most popular credit card is American Express 2/25/2008 © Jaideep Srivastava 114 ‘True-Wallet-Share’ Analysis for a Credit Card Organization Range < $100 Total Users 462 Discover 4.13 American Express -467.40 (152) Mastercard 0 Visa -29.76 (87) Other -60.29 (272) Average Total -190.74 $100 - $200 232 -12.61 (73)(39) 120.17 (66) 0 89.95 167.10 (156) 149.44 $200 - $500 $500 - $1000 $1000 $2000 $2000 $5000 $5000 $10000 $10000+ 643 968 1386 36.97 (107) 75.57 (182) 174.55 (292) 253.77 (207) 571.09 (378) 988.97 (540) 0 0 837.25 263.27 (432) (1) 957.69 1732 620.80 (354) 1696 1332.48 (452) 2156.30 (1099) 4091.64 (814) 10111.75 (1010) 272.42 (421) 623.36 (593) 1078.01 (866) 2358.22 (1579) 4966.61 (1200) 14649.52 (1341) 342.99 893.47 1471.38 2422 (40)(135) 218.93 597.83 (217) 1018.50 (323) 2087.75 (601) 3976.93 (483) 8934.39 (642) (1) 3648.40 (3) 1921.16 3297.58 7100.20 22329.56 (9) Analysis of credit card balance habits of user base • There are1386 people, each of which carries a total balance between $1000 and $2000 on all credit cards that (s)he owns • 292 of these 1386 people own discover cards, and carry an average balance of $174.55 • 540 of these 1386 people own AmEx cards, with an average balance of $988.97 • 323 of these 1386 people carry one or more Visa, with an average Visa network balance of $1018.50 2/25/2008 © Jaideep Srivastava 115 Business Implications of True Wallet Share Analysis A credit card offeror knows exactly how much money customers holding its cards spend (every month) on its card vs. that on the competition’s cards Offeror can target users falling within various segments for specific customer acquisition, retention, etc. purposes Detailed profile and history information of these users can be used for precision targeting and customer messaging through various channels including ad serving, e-mail campaigns, promotions, etc. If transaction level detail information of these users is analyzed, it can be determined exactly which credit cards are being used by aggregation users as a whole for what kind of lifestyle activity, e.g. travel, entertainment, shopping, groceries, etc; this can help partner decide which market segments to focus on 2/25/2008 © Jaideep Srivastava 116 Business Implications (contd.) The analysis above, if carried out at an individual user level detail, can be used to target individual customers with specific promotions, etc. Transaction level detail can be classified into charges to specific organizations, department stores, airlines, etc. This will identify the top organizations that aggregation users spend money at, either on the partner’s card or on a competing network. This would be useful in determining which organizations to partner with for customer retention, and acquisition, respectively All of these analyses if performed periodically, and tracked over time, can provide valuable insight into the evolving credit balance distribution and usage behavior at the user population or individual user level 2/25/2008 © Jaideep Srivastava 117 Targeted Ad Serving 2/25/2008 © Jaideep Srivastava 118 Targeted Ad Serving (contd.) 2/25/2008 © Jaideep Srivastava 119 Privacy Issues 2/25/2008 120 let’s begin with some real examples … 2/25/2008 121 Problem: Shopping for spouse’s anniversary – too much clutter 2/25/2008 © Jaideep Srivastava 122 Solution: Focused and relevant advertisement 2/25/2008 © Jaideep Srivastava 123 Problem: Tired of mistreatment by financial institutions … You have tons of money in your investment portfolio But you are over-worked and slipped a couple of credit card payment deadlines – after all you are busy managing your investment portfolio ☺ Credit card institution treats you like a deadbeat 2/25/2008 © Jaideep Srivastava 124 Solution Why not let the credit card institution know what your investment portfolio balance is? Impress them ☺ Perhaps even authorize credit card company to transfer funds from your investment account to cover the payment? Or maybe not ☺ 2/25/2008 © Jaideep Srivastava 125 So, what’s the catch… Shopping example Allow the vendor to collect detailed information about you and build an accurate profile Junk mail is only a nuisance for the receiver, but an expense for the sender! – the sender wants to avoid it more than the receiver!! Credit card example Allow the credit card company and investment company to share your information Multiple online accounts example Hand over your account names and passwords to aggregation service Sounds scary – but over 1.5 million people have done this in about 18 months’ time!! 2/25/2008 © Jaideep Srivastava 126 let’s now talk about privacy … Merriam Webster definition a: the quality or state of being apart from company or observation b : freedom from unauthorized intrusion Justice Oliver Wendell Holmes “the right to be left alone” Operational definition Collection and analysis of personal data beyond some limit 2/25/2008 © Jaideep Srivastava 127 Public Attitude Towards Privacy A (self-professed) non scientific study carried out by a USA Today reporter Asked 10 people the following two questions Are you concerned about privacy? 8 said YES If I buy you a Big Mac, can I keep the wrapper (to get fingerprints)? 8 said YES 2/25/2008 ACM E-Commerce 2001 paper [Spiekermann et al] Most people willing to answer fairly personal questions to anthropomorphic web-bot, even though not relevant to the task at hand Different privacy policies had no impact on behavior Study carried out in Europe, where privacy consciousness is (presumably) higher © Jaideep Srivastava 128 Public Attitude (contd.) Amazon.com (and practically every commercial site) uses cookies to identify and track visitors 97.6% of Amazon.com customers accepted cookies Airline frequent flier programs with cross promotions We willingly agree to be tracked Get upset if the tracking fails! Over 1.5 million people have trusted the aggregation service (called Yodlee) with the names and passwords of their financial accounts in less than 18 months Adoption rate has been over 3 times the most optimistic projections Medical data is (perhaps) an exception to this 2/25/2008 © Jaideep Srivastava 129 What people really want Some people will not share any kind of private data at any cost – the ‘paranoids’ Some people will share any data for returns – the ‘Jerry Springerites’ The vast majority in the middle wants a reasonable level of comfort that private data about them will NOT be misused Tangible and compelling benefits in return for sharing their private data – Big Mac example, frequent flier programs 2/25/2008 © Jaideep Srivastava 130 Remarks on Privacy Is it ‘much ado about nothing’? If indeed data collection was outlawed, and thus personalization impossible, wouldn’t the public lose – faced with generic, undifferentiated products/services? Given the public’s attitude about privacy (as shown in their actions), are privacy advocates barking up the wrong tree? Is it just a matter of time or generational issue, e.g. adoption of credit cards Where do we stand? Current position - loss of your privacy may be beneficial for you Emerging position (post September 11th ) - loss of your privacy will be beneficial for everyone Critical emerging debate - is privacy a right or a privilege? 2/25/2008 © Jaideep Srivastava 131 Concluding Remarks Internet is a high bandwidth, low latency, negligible cost, interactive channel to the customer Very high adoption rates for this channel Processing speeds and storage capacities continuing to increase while costs continue to fall Data analytics technology has grown rapidly Customer facing applications are ready for a paradigm shift Innovative companies have moved ahead Privacy is an issue, but not much of a concern 2/25/2008 © Jaideep Srivastava 132