Data Mining and Market Intelligence for Optimal Marketing Returns This page intentionally left blank Data Mining and Market Intelligence for Optimal Marketing Returns Susan Chiu Domingo Tavella AMSTERDAM • BOSTON • HEIDELBERG • LONDON • NEW YORK • OXFORD PARIS • SAN DIEGO • SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Butterworth-Heinemann is an imprint of Elsevier Butterworth-Heinemann is an imprint of Elsevier Linacre House, Jordan Hill, Oxford OX2 8DP, UK 30 Corporate Drive, Suite 400, Burlington, MA 01803, USA First edition 2008 Copyright © 2008 Susan Chiu and Domingo Tavella. Published by Elsevier Inc. All rights reserved The right of Susan Chiu and Domingo Tavella to be identified as the authors of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988 No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher. Permissions may be sought directly from Elsevier ’s Science & Technology Rights Department in Oxford, UK: phone (⫹44) (0) 1865 843830; fax (⫹44) (0) 1865 853333; email: permissions@elsevier.com. Alternatively you can submit your request online via the Elsevier homepage (http://elsevier.com), by selecting “Support & Contact” then “Copyright and Permission” and then “Obtaining Permission”. Notice No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress ISBN: 978-0-7506-8234-3 ISBN: 978-0-7506-7980-0 For information on all Butterworth-Heinemann publications visit our web site at http://books.elsevier.com Typeset by Charon Tec Ltd., A Macmillan Company. (www.macmillansolutions.com) Printed and bound in United States of America 08 09 10 11 10 9 8 7 6 5 4 3 2 1 Contents Preface Biographies 1 2 xi xiii Introduction Strategic importance of metrics, marketing research and data mining in today’s marketing world The role of metrics The role of research The role of data mining An effective eight-step process for incorporating metrics, research and data mining into marketing planning and execution Step 1: identifying key stakeholders and their business objectives Step 2: selecting appropriate metrics to measure marketing success Step 3: assessing the market opportunity Step 4: conducting competitive analysis Step 5: deriving optimal marketing spending and media mix Step 6: leveraging data mining for optimization and getting early buy-in and feedback from key stakeholders Step 7: tracking and comparison of metric goals and results Step 8: incorporating the learning into the next round of marketing planning Integration of market intelligence and databases Cultivating adoption of metrics, research and data mining in the corporate structure Identification of key required skills Creating an effective engagement process Promoting research and analytics 1 11 12 14 15 Marketing Spending Models and Optimization Marketing spending model Static models Dynamic models 19 21 23 34 3 4 4 6 6 6 7 8 8 9 9 9 10 10 vi Contents 3 4 5 Marketing spending models and corporate finance A framework for corporate performance marketing effort integration 35 Metrics Overview Common metrics for measuring returns and investments Measuring returns with return metrics Measuring investment with investment metrics Developing a formula for return on investment Common ROI tracking challenges Process for identifying appropriate metrics Identification of the overall business objective Understanding the impact of a marketing effort on target audience migration Selection of appropriate marketing communication channels Identification of appropriate return metrics by stage in the sales cycle Differentiating return metrics from operational metrics 39 41 42 42 43 44 45 45 Multi-channel Campaign Performance Reporting and Optimization Multi-channel campaign performance reporting Multi-channel campaign performance optimization Uncovering revenue-driving factors Understanding the Market through Marketing Research Market opportunities Market size Factors that impact market-opportunity dynamics Market growth trends Market share Basis for market segmentation Market segmentation by market size, market growth, and market share: case study one Using market research and data mining for building a marketing plan Marketing planning based on market segmentation and overall company goal: case study two Target-audience segmentation Target-audience attributes Types of target-audience segmentation 36 46 49 56 61 63 65 67 71 73 75 75 76 80 80 81 82 85 85 88 88 89 Contents Understanding route to market and competitive landscape by market segment Routes to market Competitive landscape Competitive analysis methods Overview of marketing research Syndicated research versus customized research Primary data versus secondary data Surveys Panel studies Focus groups Sampling methods Sample size Research report and results presentation Structure of a research report 6 7 Data and Statistics Overview Data types Overview of statistical concepts Population, sample, and the central limit theorem Random variables Probability, probability mass, probability density, probability distribution, and expectation Mean, median, mode, and range Variance and standard deviation Percentile, skewness, and kurtosis Probability density functions Independent and dependent variables Covariance and correlation coefficient Tests of significance Experimental design Introduction to Data Mining Data mining overview An effective step by step data mining thought process Step one: identification of business objectives and goals Step two: determination of the key focus business areas and metrics Step three: translation of business issues into technical problems Step four: selection of appropriate data mining techniques and software tools Step five: identification of data sources 91 91 93 95 100 101 105 106 108 109 109 110 112 112 115 117 117 118 118 118 120 120 121 122 126 126 130 134 137 139 141 141 142 142 143 143 vii viii Contents Step six: conduction of analysis Step seven: translation of analytical results into actionable business recommendations Overview of data mining techniques Basic data exploration Linear regression analysis Cluster analysis Principal component analysis Factor analysis Discriminant analysis Correspondence analysis Analysis of variance Canonical correlation analysis Multi-dimensional scaling analysis Time series analysis Conjoint analysis Logistic regression Association analysis Collaborative filtering 144 8 Audience Segmentation Case study one: behavior and demographics segmentation Model building Model validation Case study two: value segmentation Model building Model validation Case study three: response behavior segmentation Model building Validation Case study four: customer satisfaction segmentation Model building Validation 193 195 196 201 205 207 208 208 209 210 210 212 213 9 Data Mining for Customer Acquisition, Retention, and Growth Case study one: direct mail targeting for new customer acquisition Purchase model on prospects having received a catalog Purchase model based on prospects not having received a catalog Prospect scoring Modeling financial impact 145 145 146 146 151 163 165 166 168 172 175 176 179 186 188 190 190 219 221 222 224 226 226 Contents Case study two: attrition modeling for customer retention Case study three: customer growth model 227 229 10 Data Mining for Cross-Selling and Bundled Marketing Association engine Case study one: e-commerce cross-sell Model building Model validation Case study two: online advertising promotions Model building Model validation 233 235 236 237 239 241 242 243 11 Web Analytics Web analytics overview Web analytic reporting overview Brand or product awareness generation Web site content management Lead generation E-commerce direct sales Customer support and service Web syndicated research 245 247 248 248 249 250 252 253 253 12 Search Marketing Analytics Search engine optimization overview Site analysis SEO metrics Search engine marketing overview SEM resources SEM metrics Onsite search overview Visitor segmentation and visit scenario analysis 255 257 259 262 263 263 264 265 265 Index 269 ix This page intentionally left blank Preface Over the last several decades, Marketing Research has been benefiting from the ever-increasing wave of quantitative innovation in fields that have been traditionally regarded as the purview of softer disciplines. The rising level of quantitative education in the marketing research community, the extraordinary wealth of information accessible on the Internet, along with fierce competition for customers conspire to create a growing need for sophisticated applications of data-mining, statistical, and empirical methodologies to the formulation and implementation of marketing plans. As business experience is increasingly informed by the results of rigorous analysis, it becomes ever more clear that the application of quantitative modeling techniques in marketing has a direct effect on the bottom line. In the extremely competitive environment of the global economy, the potential high price of a misdirected marketing effort is made unacceptable by the abundance of information that, if properly extracted and interpreted, can guide the effort to success. This book’s primary audience is the quantitative middle of the marketing professional spectrum. The primary objectives of the book are to distill and present a portfolio of techniques and methods of demonstrable efficacy in the design, implementation, and continued assessment of a marketing effort. The selection of techniques and the extent and depth of coverage of the quantitative background needed for their practical use have benefited from our experience in practical marketing research and quantitative modeling. The resolution of business issues and the practicality of implementation have been our most important guiding principles in covering the material. The materials we discuss are essential components in today’s sophisticated quantitative marketing professional’s toolbox. The mathematical and statistical issues whose understanding is required to insure the correct interpretation of the various methodologies and their outputs are introduced with minimal complexity. The emphasis in on practical applications, exemplified with case studies implemented in standard computational analysis environments, such as SAS and SPSS. There are three main components in the coverage of the book. The first component refers to the importance and integration of marketing research, metrics, and data mining into the marketing investment process. xii Preface The second is a detailed discussion of marketing research and data mining methods with a view to solve the practical needs of a marketing effort design and implementation. The third thrust of the book is the application of the methodology to illustrative case studies, representative of the common practical challenges marketing professionals confront. San Francisco September 2007 Susan Chiu Domingo Tavella Biographies Susan Chiu Susan Chiu is currently Director of Business Intelligence at Ingram Micro, Inc., where she is responsible for advanced analytics and marketing research consulting. Susan Chiu has over 15 years of quantitative marketing research experience and has held positions in analytics, data mining, and business intelligence with Cisco Systems, Wells Fargo, Providian Bancorp, and Safeway Coporation. Susan Chiu has a Masters degree in Statistics from Stanford University. Domingo Tavella Domingo Tavella is Principal of Octanti Associates, a consulting firm focused on advanced quantitative modeling in finance and marketing. Dr. Tavella has over 25 years of mathematical and computational modeling experience in fields ranging from aerodynamic design, biomedical simulation, computational finance, and marketing modeling. He holds a Ph.D. in Aeronautical Engineering from Stanford University and an MBA in Finance from UC Berkeley. This page intentionally left blank CHAPTER 1 Introduction This page intentionally left blank ■ Strategic importance of metrics, marketing research and data mining in today’s marketing world Today’s marketing executives are under significant pressure to be accountable for their companies’ returns on investment both in the boardroom and in front of their shareholders. The following excerpt from Business Week by Brady, Kiley, and Bureau Reports (Farris, Bendle, Pfeifer and Reibstein 2006) vividly encapsulates this shift in what is expected of marketing executives. ‘For years, corporate marketers have walked into budget meetings like neighborhood junkies. They couldn’t always justify how well they spent past handouts or what difference it all made. They just wanted more money – for flashy TV ads, for big-ticket events, for you know, getting out the message and building up the brand. But those heady days of blind budget increases are fast being replaced with a new mantra: measurement and accountability.’ As pressure for accountability cascades through an organization, every functional group is under scrutiny, and those who cannot quantify their impact on generating satisfactory returns on investment are placed in a vulnerable position. At downsizing or budget reduction time, marketing executives are in the front line. Marketing, as it turns out, is among those corporate functions that are under the closest scrutiny. In recent years, there has been increased awareness and a stronger motivation among marketing professionals to quantify returns on investment. However, there is also a challenge in selecting the proper tools for measuring market returns from the large number of strategic and analytic tools that have emerged in the past decade. Planning, research, execution, and optimization are the four key stages in marketing efforts. The objective of the planning stage is to define the appropriate metrics for measuring marketing returns. The number of metrics needs to be kept under control to ensure that the measuring task is achievable. In the research stage, marketing research is done to have a better understanding of the overall market opportunities and the competitive landscape. In the execution stage, effective implementation is an essential requirement for the success of the marketing effort. In the optimization stage, marketing strategies and tactics are optimized and finetuned on an ongoing basis. 4 Data Mining and Market Intelligence ■ The role of metrics In the previous section, we alluded to the need for defining marketing metrics at the planning stage. A metric is a variable that can be measured and used to quantify the performance of a marketing effort. Metrics fall into the following categories: return metrics, investment cost metrics, operational metrics, and business impact metrics. It is important to understand the roles that different types of metrics play. Return metrics are often referred to as key performance indicators (KPI) or success metrics. The costs of marketing programs, goods sold, and capital are investment cost metrics that must be optimally related to metrics measuring investment returns. Operational metrics influence the performance of return metrics (most of the metrics we consider fall under this category), and a thorough understanding of their impact on return metrics is essential in order to track those with the highest potential. One common mistake in marketing is to invest significant resources to track hundreds of operational metrics without precisely quantifying whether they significantly influence success. Finally, it is essential to understand how marketing investment impacts a company’s financial performance. Ideas such as cash flow analysis or economic value added (EVA) have been utilized to link marketing investment and company financial performance (Doyle 2000). ■ The role of research In essence, marketing research consists of the discovery and analysis of information for understanding the market that a particular firm faces. The America Marketing Association (AMA) offers a comprehensive definition of marketing research (Bennett 1988). ‘Marketing research links the consumer, customer, and public to the marketer through information – information used to identify and define marketing opportunities and problems; generate, refine, and evaluate marketing actions, monitor marketing performance; and improve understanding of marketing as a process. Marketing research specifies the information required to address these issues; designs the method for collecting information; manages and implements the data collection process; analyzes the results; and communicate the findings and their implications.’ Since customers are key components of a market, customer research should also be considered as part of marketing research. Marketing research has been present in the corporate world for decades. Its applications mainly focus on market sizing, market share analysis, product concept testing, pricing strategies, focus groups, brand Introduction perception studies, and customer attitude or perception research. The following examples are typical applications of marketing research to address business problems. Although these examples remain fairly common marketing research applications, they are somewhat limited in the whole scheme of marketing investment. ● ● ● ● ● Running a focus group to evaluate customer experience in certain retail bank branches Determining the feasibility of a full product rollout by first conducting a test in a small and easy-to-control market Conducting a recall test to determine a TV advertisement’s impact on product awareness Compiling market share information for a briefing to a group of industry analysts Conducting a focus group to evaluate new product features. Marketing research groups are often spread across various corporate functions such as corporate communications, public relations, corporate marketing, segment marketing, vertical marketing, business units, and sales. Under such an organizational setup, the various marketing research efforts in a particular firm serve specific purposes and are sometimes disconnected from each other. In recent years, there has been recognition that optimal synergy among research teams requires centralization of the marketing research teams. The recent economic climate has fostered a broader application of research to marketing investment. For securing resources and funding, marketing investment plans need to be justified by a reasonable level of returns, and this justification needs to be backed up by facts, forecasts, data, and analysis of opportunities. Marketing research generates market opportunity information ideal for supporting such marketing investment plans. For instance, one important question to address is the geographical allocation of marketing investment. Marketing research can be used to determine market opportunities by geography and to drive optimal investment decisions. With increasing frequency, marketing executives at major corporations are asked to submit their annual budget plans with forecast of corresponding returns on investment. The best practices report Maximizing Marketing ROI by the American Productivity and Quality Center (APQC) in conjunction with the Advertising Research Foundation (ARF) reported the following findings (Lenskold 2003). ● ● ● The pressure is on marketing to demonstrate a quantifiable return and on CEOs to deliver value to their stockholders and business alliance partners ROI-based marketing is sought by more marketers ROI-based models encourage decision makers to challenge and revise the budgeting process. 5 6 Data Mining and Market Intelligence ■ The role of data mining Berry and Linoff give the following definition of data mining (Berry and Linoff 1997). Data mining is the process of exploration and analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns and rules. In essence, data mining is the application of statistical methodologies to analyzing data for addressing business problems. While marketing research allows for opportunities to be identified at a macro level, data mining enables us to discover granular opportunities that are not immediately obvious and can only be detected through statistical techniques and advanced analytics. High-level insights provide directional guidance while granular detail facilitates optimization, execution, and tactics. Insight garnered through marketing research can help drive data mining analysis by providing the initial direction. Conversely, results from data mining analysis can be used to refine high-level strategies. Marketing research and data mining are two disciplines that are complementary to each other, and there is growing awareness of the values added that these two disciplines combined can provide. ■ An effective eight-step process for incorporating metrics, research and data mining into marketing planning and execution The following flowchart (Figure 1-1) summarizes a step-by-step approach for incorporating metrics, marketing research, and data mining into marketing planning and execution. Step 1: identifying key stakeholders and their business objectives It is crucial to identify the key stakeholders that will support a market effort and those that will implement the recommendations from research and analysis. Buy-in from key stakeholders throughout the process is essential for getting analytic results accepted and implemented. Key stakeholders need to quantify their business objectives and define such objectives as goals to be achieved. An objective might be to increase Introduction Step 1 Identifying key stakeholders and their business objectives Step 2 Selecting appropriate metrics to measure marketing success Step 3 Assessing the market opportunity Step 4 Conducting competitive analysis Step 5 Determining optimal marketing spending and media mix Step 6 Leveraging data mining for optimization and getting early buyin and feedback from key stakeholders Step 7 Tracking and comparing of metric goals and results Step 8 Incorporating learning into next round of marketing planning Figure 1-1 An effective eight-step process for incorporating metrics, marketing research, and data mining into marketing planning and execution. sales revenue. An increase in revenue by a specific percentage over the course of a year, therefore, is a quantified objective to be reached. An objective that is not quantified is hard to measure against and should not be used to derive investment strategy. Step 2: selecting appropriate metrics to measure marketing success A marketing plan requires that metrics should be clearly defined from the outset since the selection of appropriate metrics can direct resources to optimal use. Multiple metrics may need to be examined simultaneously to glean the insights we seek. Multiple metrics can be used to validate one another and maximize the accuracy of the information gathered. A single metric such as revenue growth alone might not shed as much light on the true opportunity as revenue growth and market share information combined. If the business objective is to increase sales revenue by a given amount, then naturally sales revenue is the appropriate return metric to track. In 7 8 Data Mining and Market Intelligence the case of an advertising program, brand equity (the monetary worth of a brand) and brand awareness are the appropriate return metrics to track. Brand equity is defined as the net present value of the future cash flow attributable to the brand name (Doyle 2000) while brand awareness is the level of exposure and perception that customers have about a particular brand. Step 3: assessing the market opportunity Market opportunity assessment consists of addressing four fundamental questions. 1. 2. 3. 4. Where are the market opportunities? What are the market segments? What is the size of each segment? How fast does each segment grow? Market opportunity information can be acquired through multiple approaches. One approach is through exploration of publicly available news and existing company internal data. Another approach is leveraging third party marketing research sources, which offer a wide range of forecasts on market opportunities by segment. These forecasts, which consist of both opportunity size and growth information, tend to be driven by different assumptions. In situations where market opportunity information is not readily available, customized research is required to gather the information. Step 4: conducting competitive analysis In the absence of competition, a company can take full advantage of market opportunities. With competition, however, companies can only realize market opportunities by understanding and outperforming their competitors. As Aaker points out, one important reason why the Japanese automobile firms were able to penetrate the US market successfully, especially during the 1970s, is that they were much better than US firms at doing competitive analysis. David Halberstam described the Japanese effort at competitor analysis in the 1960s ‘They came in groups… They measured, they photographed, they sketched, and they tape-recorded everything they could. Their questions were precise. They were surprised how open the Americans were’ (Aaker 2005). Competitive intelligence is an extremely important discipline in the world of marketing research and data mining. A combination of survey data and real life transaction data can be used to analyze and track competitive information. Part of a competitive intelligence analysis is to objectively assess product features, pricing, and brand value of the key players in a market. Introduction Product features that meet customer needs represent competitive advantages, and pricing is often used as a tool for gaining market share at the expense of profitability. Since brand perception often affects purchasing decisions it is important to incorporate brand strength and weakness analysis into competitive intelligence. Step 5: deriving optimal marketing spending and media mix After the fundamental information on market opportunities and competitor landscape has been collected, we proceed to determine the optimal marketing spending given a business objective. As we will elaborate in Chapter 2, there are numerous analytical approaches for modeling optimal marketing spending. Optimization involves maximization or minimization of a particular metric such as maximization of profit and minimization of risk. Maximizing profit is the most common objective in optimization of marketing spending. Some companies may choose to maximize revenue regardless of profitability, but doing so imperils the firm’s long-term value. Step 6: leveraging data mining for optimization and getting early buy-in and feedback from key stakeholders The high-level and directional insights into market opportunities provided by marketing research serve as the foundation for building a highlevel marketing strategy. However, implementation of a high-level strategy through tactics requires significant analytical work. This is where data mining adds value by delineating a ‘how to’ road map to realize the opportunities uncovered by research. Marketing research could identify a geographic area as the best opportunity. Since it is very costly to target every prospect in this geography, it is necessary to select a target list for a marketing campaign, which requires building a response model to predict the likelihood of a prospect’s response. Response modeling requires statistical data mining techniques such as trees and logistic regression. Soliciting key stakeholders’ feedback and input in the data collection, research and data mining processes can help fine-tune the accuracy and objectivity of the data mining effort by removing potential roadblocks and barriers in the processes. Step 7: tracking and comparison of metric goals and results In the final presentation on the performance of a marketing campaign, it is essential to compare results derived from the application of the selected 9 10 Data Mining and Market Intelligence key metrics against the initial business goal. In a successful marketing campaign where goals are achieved, effective strategy and tactics can be applied to future campaigns. In a failed marketing campaign where the result trails the goal, areas of improvement for strategy and tactics can be identified for improving the performance of future campaigns. The final presentation of any research or data mining project is a decisive factor for the success of the project. Good research or data mining work poorly presented will fail to gain adoption and implementation. We have been the victims of speakers who did not know how to ‘work an audience’, to bring them to the point where they are quite ready to accept what is being recommended (Blankenship and Breen 1995). Step 8: incorporating the learning into the next round of marketing planning Learning from the past programs needs to be incorporated into the next round of marketing planning as an ongoing optimization process, a practice that ultimately leads to a competitive advantage. Learning over time transcends to internal and proprietary market and customer intelligence which competitors have no access to. ■ Integration of market intelligence and databases Market intelligence refers to insights generated from marketing research or data mining. Market intelligence provides the maximum value and insight when its components and parts are weaved together to depict an overall picture of the market opportunities and challenges. Information on revenue growth, competitors, or market share in isolation does not provide significant value, since a company may be growing its revenue but at the same time losing market share if its competitors grow faster. Information on past customer purchase data can often be misleading if the future needs of the same customers differ drastically from their past needs. To facilitate building market and customer intelligence, it is necessary to have integrated database systems that link together data from sales, marketing, customer, research, operations, and finance. Although not a requirement, ideally all the data would be maintained on the same hardware system. If there is more than one single database, then marketing, sales, customer, research, and finance databases need to be related through some sort of identification ID such as customer ID, campaign or program code, date of purchase, and transaction ID. Introduction Marketers often encounter data quality challenges. The following is a list of common data quality issues (Groth 2000). ● ● ● ● ● ● Redundant data Incorrect or inconsistent data Typos Stale data Variance in defining terms Missing data. The best strategy to deal with data quality is to make sure that key stakeholders are fully aware of the imperfections of any data issues. Very often these same key stakeholders can help drive efforts for cleaning and standardizing the data. Poor data quality arises due to many factors, not the least of which is erroneous data from original data source systems. These source systems may include systems of Enterprise Resource Planning (ERP), Point of Sale (POS), Financial, Customer Relationship Management (CRM), Supply Chain Management (SCM), marketing research, campaign management, advertising servers, e-mail delivery, web analytic tools, and call centers. Firms should consider establishing an automatic process that checks and corrects data input into the source systems. Market opportunity forecasts created by internal departments may vary from those provided by external research firms. The former are often used for setting sales goals and as a result tend to be more conservative while the latter tend to be more aggressive to accommodate a broad set of objectives and assumptions of research subscribers. This difference may lead to inconsistencies that make it difficult to assess the accuracy of the data. ■ Cultivating adoption of metrics, research and data mining in the corporate structure Given the importance of metrics, research, and data mining, having a team specialized in these areas working closely with all key business functional groups can be a competitive advantage. In high-tech industries where sales and marketing groups are often run as separate groups, it is imperative that a dedicated analytic team interface with both marketing and sales groups to ensure proper planning and execution. When sales and marketing agree upon common metrics for setting their benchmarks, the two groups can work effectively together. If sales and marketing have different assessments of market potential, the two groups will likely create unsynchronized or 11 12 Data Mining and Market Intelligence even conflicting goals in their marketing and sales programs, which may result in suboptimal execution of the overall marketing effort. The following are additional tips for successfully incorporating research and analytics into the corporate structure. Identification of key required skills Skills in three key disciplines, metrics measurement, marketing research, and data mining, are required for assembling a successful research and analytic team effort. Besides discipline-specific capabilities, there are additional skills that are common requirements across the three disciplines. Common required skills Clear communication enables a research and analytic team to effectively acquire feedback and articulate findings, thereby facilitating buy-in from key stakeholders. Many analytic professionals are used to communicating in technical terms and have difficulty translating technical terms into plain everyday language. This imposes an extra burden on analytic professionals when explaining analytic concepts to their nonanalytic peers. Two of the most common communication issues are a lack of a clear understanding of the questions asked, and the tendency to give unnecessary information when delivering an answer. An executive who asks a question ‘What is the expected return of the program?’ expects to get a response clearly stating the expected return. Rather than giving a direct answer, many analytical professionals tend to give a vague response and then quickly go on and elaborate on the data mining techniques applied even when the executive does not specifically ask about the data mining techniques being used. The first step toward resolving this communication issue between analytic and nonanalytic professionals is to cultivate ‘active listening’ skills. Active listening requires understanding of what others ask before giving replies. Another required common skill is the ability to focus on truly important tasks and to be able to prioritize tasks based on predetermined criteria, a significant challenge when confronting multiple projects. One way to facilitate focus and prioritization is to establish and formalize a standard engagement process where given criteria are used to determine the priority level of a project. Such criteria may include expected return on investment, turnaround time, resource requirement, revenue potential, and risk level. Another required skill across metrics, research, and data mining is experience and training in marketing and knowledge of the company line of business. The type of marketing experience and training required depends on the overall company marketing culture and use of communication media. Some corporations rely on traditional marketing communication Introduction channels such as print and catalog while others focus on new media such as e-mail, search, blog, ipod, and web marketing. Familiarity with the specific types of marketing communication channels that a firm uses allows for derivation of deeper insights from analysis and more substantial business recommendations. Metrics-specific required skills Metrics specific skills are also called measurement skills, which in the marketing consulting world refer to the identification and tracking of marketing campaign performance. Metrics skills include hands-on experience in tracking and measuring performance of a wide array of marketing communication channels. These communication channels include, but are not limited to, TV, radio, direct mail, e-mail, telemarketing, web marketing, online or print advertising, search marketing, social marketing (blog, community marketing), and podcast. Usage of metrics does not require advanced data mining or statistical skills; rather it requires hands-on experiences in marketing campaign planning, management, execution, and performance tracking and analysis. Metrics experts are expected not only to have extensive understanding of marketing channels and programs, but also to have clear insights into what is important to the overall marketing business. Before selecting any metrics, metrics experts conduct discovery meetings with the key stakeholders to fully understand their goals and propose metrics that are aligned with these goals. Metrics identification and measurement benefit greatly from strong reporting skills, such the ability to create reports using standard tools such as EXCEL, ACCESS, and OLAP tools such as Business Objects, Brio, Crystal Report, and Cognos. Finally, metrics expertise also includes an understanding of both the potential and the limitations of data for constructing or deriving metrics. Practitioners should seek for alternative data sources or metrics if the existing sources have poor data quality. Research-specific required skills There are two basic types of marketing research: syndicated and customized research. Marketing research skills are often acquired through training in the social science disciplines. Syndicated research expertise includes experience in effective acquisition of data from syndicated research vendors, and management of vendor relationships and research subscriptions. Customized research expertise consists of skills for designing and managing projects, survey research, focus groups, vendor selection, requests for proposal (RFP) process, and presentations of findings and results. Industry and product knowledge is also an important 13 14 Data Mining and Market Intelligence required attribute for customized research in that they allow for better decisions over vendor selection and extraction of insight from studies. Knowledge and experience in economics, which entail skills in collecting, analyzing, and interpreting market trends, economic climate data, and economic impact on market opportunities, are valuable attributes for both syndicated research and customized research. Data mining-specific required skills Practice of the data mining discipline is driven by two main skill sets: statistics and information technology. The required statistical skills include the abilities to conduct exploratory analysis and to apply a broad range of data mining techniques to solving marketing problems. The information technology skills include expertise in database structure, data processing and quality control, and data extraction skills such as Standard Query Language (SQL). Creating an effective engagement process An engagement process needs to be in place to effectively manage and enhance the research and analytic efforts. Without an engagement process, a research and analytic team passively takes on ad-hoc requests where prioritization may be based solely on the particulars of the corporate pecking order without regard for the most relevant business objectives. In such situations efforts may be driven and prioritized by rationales other than project returns on investment. The following list describes a step-by-step engagement process. 1. Choosing a point of contact person within a research and analytic team: This individual should have an overall understanding of the capabilities of the team. Ideally, the point person should be a recognized project manager who can effectively manage timelines, collect requirements, transmit the requirements back to the team, and build relationships. 2. Determining the communication channels through which a research and analytic team can be engaged: There are numerous ways in which an engagement can take place, such as phone, e-mail, and the web. Online request forms can be used to gather business requirements to be followed up with face-to-face needs assessment meetings if needed. The research and analytic management team can review incoming requests on a regular basis. Written requests allow for systematic documentation of requirements and customer needs. 3. Selecting the criteria, process, and frequency for project prioritization: Project prioritization involves ranking projects on the basis of predetermined criteria and using the ranking to determine the order for executing the projects. These criteria include project return on Introduction investment, incremental revenue, and incremental number of leads generated. Key team members and stakeholders should be involved in the project prioritization process by holding periodic discussion and prioritization meetings. The prioritization frequency refers to the frequency of holding discussion and prioritization meetings and depends on the anticipated duration of projects. It is common to adopt a monthly frequency of prioritization since overly frequent prioritization is not necessary and can disrupt work schedule. 4. Clearly communicating project delivery timelines and deliverables: After the priority of a project is determined, the group point person needs to communicate the project timeline and deliverables to those who request the group’s service, and effectively manage their expectations throughout the project duration. Promoting research and analytics A research and analytical service to potential internal and external customers can be promoted in a number of ways. One approach is the distribution of a periodical e-newsletter to communicate the offerings, the accomplishments, and future project pipeline of such a service to key stakeholders. Another approach is the creation of a service web site with the following key sections. ● ● ● ● ● ● ● Home page Who we are Engagement process Services Case studies and success stories Events Contact us. ■ Book outline The remaining chapters of the book are organized as follows. Chapter 2: marketing spending models and optimization Chapter 2 introduces the concept of marketing spending modeling for deriving an optimal overall marketing spending budget and effectively allocating this budget across different product categories or marketing 15 16 Data Mining and Market Intelligence communication channels. The chapter then gives a conceptual overview on how to associate marketing returns with the financial performance of a firm based on the modern portfolio theory. Chapter 3: metrics overview This chapter proposes a step-by-step procedure to guide metric selection by first introducing the concept of a sales funnel and its five stages. It then discusses the various types of metrics commonly used in marketing such as return metrics, investment metrics, and operational metrics in the context of a sales funnel. This chapter also gives an overview on the various marketing communication channels and how they are usually used across the five key stages in a sales funnel. Chapter 4: multi-channel campaign performance reporting and optimization This chapter discusses how to report and optimize the overall performance of marketing campaigns that utilize multiple communication channels. The performance reporting section examines the identification and aggregation of common return metrics across multiple communication channels. The performance optimization section of the chapter discusses data mining on operational metrics to uncover the operational metrics with the highest influence on return metrics. Chapter 5: understanding the market through marketing research This chapter discusses creating a deeper understanding of the market through marketing research. Understanding of the market includes knowledge and insights on the market opportunity and segmentation, routes to market, and competitive landscape. This chapter also reviews marketing research fundamental topics such as syndicated research, customized research, primary data, secondary data, survey design and sampling, focus group, and panel group. Chapter 6: data and statistics overview This chapter discusses data and statistical concepts that drive selection of data mining techniques for solving marketing problems. Topics such as data types, data distributions, and sampling methodologies are reviewed in detail. Introduction Chapter 7: introduction to data mining This chapter examines an array of widely utilized data mining techniques applied to marketing by providing a theoretical overview of each technique and discussing specific examples for some of the techniques. Standard data mining procedures such as data exploration, modeling, validation, and testing are introduced. The following data mining techniques are covered. ● ● ● ● ● ● ● ● ● ● ● ● ● ● Association analysis Analysis of variance (ANOVA) and multivariate analysis of variance (MANOVA) Canonical correlation analysis Cluster analysis Collaborative filtering Conjoint analysis Correspondence analysis Decision tree Discriminant analysis Factor analysis Logistic regression Multi-dimensional scaling (MDS) Principal component analysis (PCA) Time series. Chapter 8: audience segmentation Chapter 8 presents four case studies on audience segmentation to illustrate the application of four data mining techniques: cluster, CART, CHAID, and discriminant analysis. The four case studies are on behavior and demographics segmentation, value segmentation, response behavior segmentation, and customer satisfaction segmentation. Each case study gives the background of a business problem, the data mining technique applied to address the problem, the data mining model building and validation processes, and the marketing recommendations resulted from the data mining analysis. Chapter 9: data mining for customer acquisition, retention, and growth This chapter discusses three case studies on targeting, growth and retention models to demonstrate the application of the logistic regression technique. Each case study examines the background of a business problem, the data mining technique used to solve the problem, the 17 18 Data Mining and Market Intelligence data mining model building and model validation processes, and the recommendations. Chapter 10: data mining for cross-selling and bundled marketing This chapter discusses two case studies on e-commerce and targeted online advertising promotions. In both case studies, the fundamental data mining techniques for cross-sell and up-sell are applied to real marketing scenarios. Chapter 11: web analytics The chapter introduces the fundamentals of web analytics and its key metrics by business objectives such as lead generation and online e-commerce. It also introduces syndicated research tailored for understanding web marketing trends and online customer behavior. Chapter 12: search marketing analytics This chapter discusses the principles of three search marketing disciplines: search engine optimization (SEO), search engine marketing (SEM), and onsite search. The chapter also provides links to web resources on subjects such as key words, domain, meta tags, and pay per click. ■ References Aaker, D. Strategic Market Management, 7th ed. John Wiley & Sons, New York, 2005. Bennett, P.D. Dictionary of Marketing Terms. American Marketing Association, Chicago, Illinois, 1988. Berry, M.J.A., and G.S. Linoff. Data Mining Techniques: For Marketing, Sales, and Customer Support. John Wiley & Sons, New York, 1997. Blankenship, A.B., and G.E. Breen. State of the Art Marketing Research. Chapter 12, page 277. NTC Business Books, Lincolnwood, Illinois, 1995. Doyle, P. Value-Based Marketing – Marketing Strategies for Corporate Growth and Shareholder Value. John Wiley & Sons, New York, 2000. Farris, P.T., N.T. Bendle, P.E. Pfeifer and D.J. Reibstein. Marketing Metrics: 50+ Metrics Every Executive Should Master. Wharton School Publishing, Upper Saddle River, New Jersey, 2006. Groth, R. Data Mining – Building Competitive Advantage. Prentice Hall PTR, Upper Saddle River, New Jersey, 2000. Lenskold, J.D. Marketing ROI – The Path to Campaign, Customer, and Corporate Profitability. McGraw Hill, New York, 2003. CHAPTER 2 Marketing Spending Models and Optimization This page intentionally left blank Two of the most important questions in marketing refer to how much should be spent and how should the budget be allocated. These questions can be answered in more than one way, depending on the particulars of the firm’s circumstances and the availability of data. In this chapter we address these important issues in an econometric framework. ■ Marketing spending model The primary objective of a marketing spending model is to establish a relationship between marketing investments and marketing returns. Marketing returns are the benefits that a firm receives when it invests in marketing, such as sales value or number of product units sold. A properly devised marketing spending model also helps us understand the way these variables interact, allowing us to gain deeper insights into what is most effective at influencing marketing return. Most of us are aware of potential diminishing returns of marketing spending. That is to say, as market spending increases, its incremental or marginal impact eventually starts to decrease. This is just one of the many features of the complex relationship between marketing spending and revenue. The exact relationship between marketing return and marketing spending can take on many different mathematical forms depending on factors such as the type and frequency of data and the industry segment where the model is applied. Typically, various functions need to be explored to derive the best model and the best corresponding function form. In this book, we will use the term ‘marketing spending model’ to name models that describe the relationship between marketing spending and marketing return. In addition to marketing spending as an independent variable impacting marketing returns, there are other potential independent variables such as seasonality, product price, and the state of the economy. The functional form that describes the relationship between marketing spending and return, which may be quite general and complex in nature, must above all lend itself to calibration in a stable and predictable way. A comprehensive and detailed description of marketing spending model mathematics is beyond the scope of this book. Here we give a brief summary of the issues involved to guide the reader into a more extensive treatment of the subject, such as the excellent book by Hanssens, Parsons, and Schultz, where an extensive list of valuable references can be found. In some sources, the term ‘market response model’ is used in place of ‘marketing spending model’, used in this book. There are two reasons for settling for the terms ‘marketing spending model’. One reason is that it clearly conveys the message that such a model is related to marketing spending. Another reason is that it avoids confusion with the terminology 22 Data Mining and Market Intelligence used in targeting analysis, where ‘market response model’ is often used to refer to ‘response targeting models’. For any marketing spending model to be effective, it must be data based. This reliance on data is what configures an empirical marketing spending model (Hanssens, Parsons and Schultz 2001). Data enters in the formulation and calibration of a marketing spending model in two primary ways: sequentially and cross-sectional. Sequential data comes in the form of time series information, consisting of values at discrete points in time. Crosssectional data is data that describes values that occur at the same point in time where these values can belong to time series. In general, when generating cross-sectional data, we deal with multidimensional time series – that is, discrete and simultaneous information of several variables. Although the primary framework in setting up and calibrating marketing spending models is data based, at the inception of a new marketing plan the relevant data may not be available. In this case, the data-based model is preceded by an initial growth stage, where parameters are set based on the subjective judgment of experienced managers. Our assumptions about markets as well as the level of detail that we want to capture influence the modeling task. We may, for example, assume that the market parameters are stationary, such as constant demographics and employment level, or that market parameters are evolving very slowly in comparison with our planning horizon. In such cases, our models will be designed to respond appropriately in stationary markets. A different and more complex situation arises if markets are evolving rapidly when compared with our planning horizon. In such a case, a different class of models would be required to capture the intrinsic dynamics of the market. If we consider the market to be stationary, there are two types of models we can postulate, each corresponding to a different level of analysis. At a simpler level of analysis, we may assume that the sales and drivers adjust instantaneously as their level changes. This means that whatever functional relationships we formulate between our variables only involve time to the extent that the variables change in time, not to the rate of change of the variables in time. A situation like this reflects equilibrium among variables and the types of models appropriate for this case are referred to as static models. From the perspective of time series data, static response models involve the marketing investment variables evaluated at a single point in time. Simple regression models fall in this category. Within the assumption of stationary markets, at a more complex level of analysis we may consider the time of adjustment among variables as their levels change. This means that our model will capture not only the time changes in the levels of the variables, but also the rates at which the variables change in time. A model capable of capturing the noninstantaneous adjustment among variables is referred to as a dynamic model. Marketing Spending Models and Optimization Dynamic models involve the marketing investment variables evaluated at multiple points in time. From a time series’ point of view, this implies that the model will involve lags and leads. Simple auto-regressive models are examples of this category. In addition to the two time effects we have described so far – the response to the level of variables as opposed to both the level and the level changes of variables – there is yet another time effect imposed by market fluctuations. To properly capture market fluctuations we must formulate models in nonstationary or evolving markets. Models of this type must be able to capture the nonstationarity of the statistical properties of time series. Auto-regressive moving average models (ARMA) are examples of this type of models. Here we will limit the discussion to stationary markets. This is the situation we face when our marketing planning horizon, usually on a quarterly basis, is relatively short compared with the time evolution scales of economic or demographic effects. Static models A model can be very complex if the full interaction among variables is taken into consideration. A simple static model is of the form Q c0 c1 f (X ) (2.1) where Q is the dependent variable of interest, such as sales volume, X is the independent variable, in this case the marketing spending in the marketing plan, and c0 and c1 are coefficients to be estimated. Independent variables are also called explanatory or predictive variables. If function f(X) is linear, the model is referred to as a linear model. Notice that this notion of linearity is not the same as the concept of linearity we encounter in estimation problems. In estimation, linearity refers to the way in which coefficient c0 and c1 enter in the functional form of the model. In estimation problems, the formulation is called linear if the estimation coefficients enter linearly, even if the dependent variables appear in a nonlinear form. The important distinction in this regard is that as long as the estimation coefficients appear linearly in the model they can be estimated by linear regression. Elasticity An important concept to characterize a model is the notion of elasticity (Hanssens, Parsons and Schultz 2001). Elasticity is the ratio of the relative changes of the dependent and independent variables. 23 24 Data Mining and Market Intelligence Q Q X Q eX X X Q X (2.2) If there are several explanatory variables, the elasticity with respect to one variable is computed keeping the other variables constant. Mathematically, for infinitesimal changes in X and Q the elasticity can be written in terms of the partial derivative eX Q X X Q (2.3) Simple linear model In the simplest case of the linear model Q c0 c1 X (2.4) c1 X c0 c1 X (2.5) the elasticity is eX A model of this form reflects the assumption that additional marketing spending results in the same increment in the dependent variable, regardless of level. This situation is referred to as constant return to scale. It is more realistic to assume a situation of diminishing returns, where additional marketing spending brings about increasingly smaller responses. The constant elasticity model we analyze in the next section accomplishes this objective. Power models An interesting model that accomplishes diminishing returns is known as the constant elasticity model Q aX b (2.6) where 0 b 1. This is a particular case of a power model. The elasticity of this model is constant and equal to b, which gives intuitive meaning to this coefficient. However, the price we pay for having constant elasticity is that the rate of change of Q for vanishing values of X is infinitely large, as shown in Figure 2-1. Marketing Spending Models and Optimization Q 0 X Figure 2-1 Sales volume as a function of marketing effort in the constant elasticity model. Another attractive property of this model is that we can estimate the parameters with linear regression by working with the logarithms of both sides of Eq. (2.6) log Q log a b log X (2.7) The linear regression gives us log b, from which we can extract b. The functional form in Eq. (2.6) can be generalized to the case of multiple independent variables in several ways. In the case of two variables, we have Q a1 X1b1 a2 X 2b2 a12 X1b12 X 2b21 (2.8) This model captures the interaction of the independent variables in the last term, but is no longer a constant elasticity model. We can maintain the desirable property of constant elasticity property by defining our model as follows Q a12 X1b12 X 2b21 (2.9) Notice that the unrealistic behavior of infinitely rapid change of Q for vanishing values of the independent variables persists in this formulation. 25 26 Data Mining and Market Intelligence S-shaped models If we can argue that the nature of the return changes from an increasing one to a decreasing one as a function of the independent variable, we can consider an S-shaped curve. In an S-shaped model, there is a transition from a convex to a concave return represented by an inflexion point. A simple function that represents such a shape is the following exponential model, ⎛ b⎞ Q exp ⎜⎜ a ⎟⎟⎟ ⎜⎝ X⎠ (2.10) where both a and b are positive. It is easy to see that the elasticity of this model decreases with X eX b X (2.11) Figure 2-2 shows the overall shape of this functional form. The inflexion point is located at X b/2. The fact that this function starts out with a zero slope means that there is no response to small marketing efforts. Q Inflexion point 0 X Figure 2-2 Sales volume in the exponential model. Modifications to the S-shaped model Other possible modifications to the S-shaped model include imposing a saturation level that reflects the fact that sales may not increase beyond certain level of effort, or a sales floor to indicate that sales may still take place in the total absence of any marketing effort. Functions of this type Marketing Spending Models and Optimization capable of describing general S-shapes are called sigmoid functions, of which the well-known logistic function is a particular case. An example of a nonsymmetrical one-dimensional logistic model that incorporates both a sales floor QL and a saturation level QU is the following Q QL QU aX b 1 aX b (2.12) A plot of this function is shown in Figure 2-3. The function starts at QL and asymptotes to QU as X grows. If QL and QU are postulated, the parameters in Eq. (2.41) can be estimated using the logarithmic form of the function. For example, for the case of two variables we have log Q QL log a b1 log X1 b2 log X 2 QU Q (2.13) Saturation level Q Sales floor X Figure 2-3 One-dimensional logistic model with sales floor and saturation level. Semilogarithmic model A semilogarithmic function captures diminishing returns and as a result is a widely used function form (Leeflang, Wittink, Wedel and Naert 2000). In a semilogarithmic model, number of units of products sold Q and marketing spending X follow the relationship. Q 0 1 ln X (2.14) 27 28 Data Mining and Market Intelligence A regression estimate of Q is: Qˆ ˆ0 ˆ1 ln X (2.15) We now apply this functional form in an optimization framework where we maximize profits with respect to marketing spending. For this exercise, we define profit as gross profit adjusted by marketing spending. Gross profit is the difference between revenue and the cost of producing a product or providing a service without adjustment for items affected by marketing expense, such as overhead or payroll. Consistent with this definition, profit is given by P ( p c)Q X (2.16) where p and c are quantities independent of X representing the unit price and the unit variable cost of goods sold, respectively. To maximize profit, the following condition has to be met. (( p c)Q X ) P Q X Q ( p c) ( p c) 1 0 X X X X X (2.17) Based on Eq. (2.17) and replacing Q with its estimator, we get ⎛ ln X ⎞⎟ ˆ 1 Q ˆ1 ⎜⎜⎜ ⎟⎟ 1 ⎝ X ⎠ pc X X (2.18) Therefore, the marketing spending that will optimize the profit of the marketing effort is: X ( p c)ˆ1 (2.19) Marketing spending model case studies Next we discuss two case studies illustrating the use of static models. The first case study applies the formulation discussed in the previous section, while the second expands the analysis to include the residual effect over time of marketing expense. Case study one Assume a company spent $5100 on a suboptimal marketing effort, sold 800 units of products, and realized a profit of $34,900. We will apply a semilogarithmic model to the historical data of the firm to determine the optimal marketing spending that would have maximized the profit of the marketing effort, and then compare the optimal marketing spending with the amount that the firm actually spent. Marketing Spending Models and Optimization To derive the optimal marketing spending which maximizes the profit of the marketing effort, we use the following parameters in Eqs. (2.15) and (2.19) based on historical data: p $100, c $50, ˆ 20, ˆ 1 120. Given 0 these parameters, the optimal marketing spending is: X (100 50) 120 6000 (2.20) The estimated number of product units sold given the optimal marketing spending is: Qˆ ˆ0 ˆ1 ln X 20 120 ln(6000) 1024 (2.21) The maximum profit of the marketing effort is then: P ( p c) Q̂ X (100 50) 1024 6000 45, 200 (2.22) Based on this analysis, we conclude that the optimal marketing spending should be $6000, $900, or 18%, above the amount that the company actually spent. If the company had spent $6000 on marketing, it would have produced a profit of $45,200, 30% higher than the profit it actually realized. Case study two A dollar spent on marketing activities today drives not only the sales today, but also sales in the future. The impact of advertising on generating marketing returns has a residual effect over time. In the next case study, we incorporate such residual effects in a time series (Leeflang, Wittink, Wedel, and Naert 2000). We assume that the residual effect is t at time t and that the simply compounded discount rate per unit time period is i. The estimated profit given the optimal marketing spending is: tn t X t t0 (1 i) P ( p c)(ˆ0 ˆ1 ln X ) ∑ (2.23) Using the sum of a geometric series, this expression can be simplified as the following when n is large. ⎞⎟ ⎛ ⎜⎜ ⎟⎟ 1 ⎜ ⎟⎟ X P ( p c)(ˆ0 ˆ1 ln X ) ⎜⎜ ⎟⎟⎟ ⎜⎜ ⎟ ⎜⎝ 1 1 i ⎟⎠ (2.24) 29 30 Data Mining and Market Intelligence To maximize the profit of a marketing effort with respect to marketing spending, the condition imposed by has to be satisfied: ⎞⎟ ⎛ ⎜ ⎟⎟ ˆ1 ⎜⎜ P 1 ⎟⎟ 1 0 ⎜⎜ ( p c) X X ⎜⎜ 1 ⎟⎟⎟ ⎟ ⎜⎝ 1 i ⎟⎠ (2.25) The optimal marketing spending is given by ⎞⎟ ⎛ ⎜⎜ ⎟⎟ 1 ⎜ ⎟⎟ X ( p c)ˆ 1 ⎜⎜ ⎟⎟⎟ ⎜⎜ ⎟ ⎜⎝ 1 1 i ⎟⎠ (2.26) Assume a company spent $8000 on a suboptimal marketing effort, sold 1000 units of products, and realized a profit of $42,000. We will apply a semilogarithmic model to the historical data of the firm to determine the optimal marketing spending that would have maximized the profit of the marketing effort, and then compare the optimal marketing spending with the amount that the firm actually spent. To derive the optimal marketing spending which maximizes the profit of the marketing effort, assume now the following parameters: p $100, c $50, ˆ0 20, ˆ 150, 20%, and i 3%. 1 Based on Eq. (2.26), the optimal marketing spending is ⎞⎟ ⎛ ⎜⎜ ⎟⎟ 1 ⎜⎜ ⎟⎟ 9307 X (100 50) 150 ⎜ 0.2 ⎟⎟⎟ ⎜⎜ ⎟ ⎜⎝ 1 1 0.03 ⎟⎠ (2.27) Therefore, maximum profit of the marketing effort that can be achieved is ⎞⎟ ⎛ ⎜⎜ ⎟⎟ 1 ⎜ ⎟⎟ 9307 74 , 509 P (100 50)(20 150 ln(9307 )) ⎜⎜ 0.2 ⎟⎟⎟ ⎜⎜ ⎟ ⎜⎝ 1 1 0.03 ⎟⎠ (2.28) Based on this analysis, we conclude that the optimal marketing spending is $9307. The company actually spent $8000; this means it underspent by $1307, or 16%. If the company had spent $9307 on marketing, it would Marketing Spending Models and Optimization have produced a profit of $74,509, 77% higher than the actual profit realized. Optimal multi-channel marketing spending allocation In this section we discuss a model that can be used in either multi-channel or multi-product situation, where we evaluate the distribution of the total marketing spending across multiple marketing communication channels or products. This model is a modified version of a multi-product model originally developed by Doyle and Saunders (Leeflang, Wittink, Wedel, and Naert 2000). Assume that there are n different marketing communication channels. Let Qj denote the contribution of marketing communication channel j to the number of units sold of product, and let Xj denote the company’s marketing spending on marketing communication channel j. Q j 0 j 1 j ln X j j (2.29) The regression estimate of the total number of units sold Q is given by jn Qˆ ∑ (ˆ 0 j ˆ1 j ln X j ) (2.30) j1 The profit, which is gross profit subtracted by marketing spending, is jn jn j1 j1 P ∑ cm j (ˆ0 j ˆ1 j ln X j ) ∑ X j (2.31) where cmj p – cj, and cj is the unit variable cost of goods sold through marketing communication channel j. The optimal marketing spending of marketing communication channel j results from jn ∑ j1 (cm j ( ˆ 0 j ˆ1 j ln X j ) X j ) P 0 X j X j (2.32) The solution for Xj, the optimal marketing spending of marketing channel j, is X j cm j ˆ1 j (2.33) The total marketing spending is jn X ∑ cm j ˆ1 j j1 (2.34) 31 32 Data Mining and Market Intelligence Therefore, the fractional budget allocation to marketing communication channel j is as follows. Aj Xj X cm j ˆ1 j jn ∑ j1 cm j ˆ1 j (2.35) Optimal marketing spending allocation by product Assume there are m different marketing products. Let Qk denote the contribution of the company’s marketing spending on product k to the number of units sold of product k and let Xk denote the company’s marketing spending on product k. Qk 0 k 1k ln(X k ) k (2.36) The estimate of Qk is given by: Qˆ k ˆ 0 j ˆ 1k ln(X k ) (2.37) The profit of the marketing effort – the estimated gross profit adjusted for marketing spending is: P km ∑ cmk (ˆ 0 k ˆ1k k1 ln X k ) X k (2.38) where cmk pk – ck and pk is the unit price for product k. The optimal budget for marketing product k must satisfy the following condition km ∑ k1 cmk ( ˆ 0 k ˆ 1k ln X k ) X k P 0 Xk Xk (2.39) Therefore, the optimal marketing spending for product k is X k cmk ˆ1k (2.40) The total marketing spending is X km ∑ cmk ˆ1k k1 (2.41) Marketing Spending Models and Optimization The fractional budget allocation to product k is given by Ak Xk X cmk ˆ1k km ∑ k1 cmk ˆ 1k (2.42) Environmental changes and seasonality By environmental changes we mean situations where a driver of marketing return changes suddenly – or, more precisely, fast when compared with the marketing horizon – from one level to another. For example, a news report that suddenly exposes a product in a markedly more favorable or negative light will suddenly change the environment where the marketing effort is being conducted. The occurrence of such sudden change is a categorical rather than a numerical event. In the context of linear regressions, it is straightforward to incorporate these sudden changes through the introduction of a dummy numerical variable, Z, which takes on the value zero before the change happens, and takes on the value one after the change happens. Equation (2.4) is modified as follows Qt c0 d0 Z c1 Xt d1 ZXt (2.43) where the subscript t makes explicit the fact that observations at period t may correspond to different values of the dummy variable. The coefficients are determined through regression. This formulation allows us to use the tools of linear regression to interpret the parameters in the model and to assess their confidence intervals. This idea can be easily extended to handle multiple environmental changes. A particular case of environmental change is seasonality, where each season represents a distinct and sudden change in market conditions. Since there are four seasons, if we take one of them as a reference we need only three dummy variables to represent the changes due to the remaining seasons. Assume the reference season is indicated by the index 1, we can extend Eq. (2.43) to handle seasonality as follows. Qt c0 d02 Z2 d03 Z3 d04 Z4 c1 Xt d12 Z2 Xt d13 Z3 Xt d14 Z4 Xt (2.44) The dummy variable Zi is one within season i and zero elsewhere. The dummy variable Z1 does not appear in this expression because index 1 identifies the reference season. It is possible to extend the same idea to the case of multiple independent variables and to the nonlinear functional forms. Before leaving this section on static models, we must emphasize that although in principle we can accommodate a large number of variables 33 34 Data Mining and Market Intelligence and changes in environmental conditions, the number of variables we can handle is determined by the statistical properties of the estimated parameters. This is typically a data issue. Unless sufficient and reliable data is available, estimation will lead to parameters affected by significant error. In such cases, the functional complexity of a model may be overwhelmed by the estimation error. Dynamic models The objective of a dynamic model is to capture the adjustment time between dependent and independent variables. Notice that this adjustment time is a separate concept from the fact that the market parameters themselves may be changing in time. Capturing the time of adjustment between dependent and independent variables imposes relationships between the variable levels and their rates of change. This is a problem that can be formulated in terms of differential equations, or in terms of discrete values in time series. We focus on the latter, since this is established practice in marketing analytics. The dynamic adjustment between dependent and independent variables may respond to the dissemination of marketing information, to the anticipation of such information, or to a combination of both. Framing the problem of dynamic response in terms of dissemination of marketing information leads to the consideration of lags in a time series, taking into account anticipation would result in the inclusion of leads in the time series analysis. Since the most common situation in practice is the delayed impact of a marketing effort, we focus on the former. Simple lag model To formulate the simplest lag model, we reconsider Eq. (2.4) to represent the situation where the effect of a marketing effort is felt k periods of time after the marketing effort is implemented. The following modification of Eq. (2.4) reflects this fact Qt c0 ck Xtk (2.45) This means that the return Qt, which we assume to be a constant value during period t, is the result of the effort X implemented k periods earlier. This formula can be generalized to K lag periods Qt c0 kK ∑ ck Xtk (2.46) k1 As stated, this representation may present us with challenging calibration problems. We can get around the calibration issue by imposing additional Marketing Spending Models and Optimization structure to the right-hand side of Eq. (2.46), by making assumptions about the coefficients, ck. Geometrically distributed lag model In this particularly popular model, we assume that the impact of marketing spending on marketing performance decreases geometrically as the number of periods increases. We can formulate this model in terms of a parameter which we can interpret as the fraction that a current marketing effort has on marketing return in future periods. This parameter is called retention rate and its value is typically around 0.5. The geometrically distributed lag model is as follows k Qt c0 c(1 ) ∑ k Xtk (2.47) k0 Here, c0, c, and are parameters to be estimated. Simple manipulations of Eq. (2.47) give the following expression (Clarke 1976). Qt (1 )c0 c1 Xt Qt1 (ut ut1 ) (2.48) where c1(1)c and ut is an error term added to Eq. (2.47). We get a form suitable for regression estimation by setting wt utut1 in Eq. (2.48). In the geometrically distributed model, the short-term effect of the marketing effort is given by c1 and a fraction of the long-term impact of the marketing effort takes effect over log(1 )/log periods. The estimated values of parameters in discrete time will depend on the frequency of the time series. When repeated estimations are performed for comparative purposes, it is important to keep the data frequency constant between estimations. This issue is referred to in the literature under the concept of data interval bias (Leone 1995 and Clarke 1976). A number of other formulations along these lines are possible. For more details, the reader is referred to the references (Hanssens, Parsons and Schultz 2001 and Koyck 1954). ■ Marketing spending models and corporate finance Here we discuss some ideas of how marketing spending models could be developed in the context of corporate financial objectives. It is commonly 35 36 Data Mining and Market Intelligence argued that the primary goal of a firm is to maximize shareholder value. This would suggest that the ultimate objective of a marketing plan is to maximize the equity value of the firm that undertakes the marketing plan. Before discussing the possible interactions between marketing effort and shareholder value, we must be precise about the meaning of maximizing shareholder value. We will assume for the moment that the assets of the firm can be neatly divided between debt and equity, where equity holders absorb the vast majority of the firm’s risk and are therefore expected to be rewarded with the highest returns. In reality, the capital structure of firms is much more complex than a clear partition of equity and debt, with components that share both equity and debt-like features (such as convertible bonds and preferred stock). Investors elect to invest in the equity of a particular firm because the equity of that particular firm exhibits a profile of risk versus return that investors like. Modern portfolio theory tells us that in the long term, a change in expected equity returns will go hand in hand with changes in the risk profile of the equity. The risk profile of the equity of a firm results from the combination of the market fluctuations of the shares of the firm, and the correlation or those fluctuations with rest of the market. The task of senior managers is to position the firm in such a way that its long-term equity growth is as high and stable as possible consistent with the risk profile of its industry. This suggests that shareholders will benefit from the marketing effort indirectly, to the extent that the objectives of senior managers, which are aided by the marketing effort, are consistent with the interest of the shareholders. Next we examine a proposed framework for integrating a marketing effort with shareholder objectives. A framework for corporate performance marketing effort integration Earlier in this chapter, we discussed a way to capture environmental changes in the evolution of a time series, where the time series represents a measure of marketing performance, such as sales volume. We can extend this idea to capture the effect of the implementation of a marketing plan on the statistical properties of equity returns. In our case, the time series we observe are equity returns, and the sudden environmental change we wish to capture and quantify is the onset of the marketing plan. What precisely is the statistical property of equity returns that we aim to enhance through the marketing investment? We address this question by invoking modern portfolio theory (MPT). The theory tells us that Marketing Spending Models and Optimization the fair or risk-adjusted returns on an investment in a particular asset, Rasset, the return on a short term risk-free security, Rf, and the returns on the overall market, Rmarket are related by the following expression (Luenberger 1996). E[Rasset ] R f asset (E[Rmarket ] R f ) (2.49) where asset, known as the ‘beta’ of that particular asset, is defined as follows asset cov(Rasset , Rmarket ) var Rmarket (2.50) In the light of MPT, we can posit that the purpose of the marketing effort is to produce asset returns above the risk-free rate that exceed, or at least maintain, the fair returns predicted by Eq. (2.49). We start out by assuming that the current long-term returns of the firm at least adjust according to Eq. (2.49). To see whether the marketing spending does indeed have the desired effect, we can conduct the analysis we described in Section 2.1, where Qt is now interpreted as the change in realized return of the firm’s equity over period t. Rt Rt1 c0 d0 Z c1 Xt d1 ZXt (2.51) A linear regression analysis of this representation tells us whether the marketing effort is driving returns above the risk-adjusted levels of Eq. (2.51), or toward this level in case the equity is under-performing. ■ References Clarke, D.G. Econometric measurements of the duration of advertising effect on sales. Journal of Marketing Research, Chicago, Illinois, 13: 345–357, 1976. Hanssens, D.M., L.J. Parsons, and R.L. Schultz. Market Response Models, Econometric Time Series Analysis, 2nd ed. Kluwer, Massachusetts, 2001. Koyc, L.M. Distributed Lags and Investment Analysis. Amsterdam, North-Holland, 1954. Leeflang, P.S.H., D.R. Wittink, M. Wedel, and P.A. Naert. Building Models for Marketing Decisions. Kluwer, Massachusetts, 2000. Leone, R.P. Generalizing what is known about temporal aggregation and advertising carryover. Marketing Science, Hanover, Maryland, 14(3), 1995. Luenberger, D.G. Investment Science. Oxford University Press, New York, 1996. 37 This page intentionally left blank CHAPTER 3 Metrics Overview This page intentionally left blank In this chapter, we discuss the key metrics used for measuring and optimizing return on marketing investment. As we alluded to earlier, using the wrong metrics may have serious consequences on marketing returns, and may in fact drive a company out of business. Consider the following scenario. John, marketing director at Sigma Corporation, tracks 75 metrics to measure the impact of his company website in generating online sales. He has two dedicated full-time staff members compiling reports on these metrics on a daily basis. In addition, he has a full-time web developer focusing on changing website features to increase web traffic. Tom is the marketing director of Huber Sigma Corporation, the main competitor of Sigma. On average, Tom tracks one return (online sales revenue) metric. In addition, he tracks 10 operational metrics about his company website. Over the past year, for the same product category, Sigma’s online sales dropped 40%, while Huber Sigma’s doubled. What could have possibly gone wrong with Sigma? The answer is that Tom focused on few relevant key return and operational metrics, while John pursued every metric available without making any differentiation among them. In addition, it was suboptimal for John to focus so much energy on web traffic alone in light of his objective of generating online sales, since web traffic is really an operational metric, not a return success metric. This story also shows that focus on the wrong metrics not only costs business, but also drains resources unnecessarily. The following is a list of key questions and issues addressed in this chapter. 1. What are the common metrics used to measure returns and investments? 2. What is the most appropriate formula for return on investment? 3. What are the common challenges of tracking returns on investments? 4. What is the process of identifying appropriate metrics? 5. What are the stages in a sales cycle (a.k.a. sales funnel)? 6. What are the common marketing communication channels? 7. What are the key metrics in each stage of the sales cycle? 8. What is the difference between return metrics and operational metrics? How do we use both types of metrics to drive future campaign optimization? 9. What are the tips on addressing common ROI tracking challenges? ■ Common metrics for measuring returns and investments In order to properly measure marketing returns on investment, we need to identify appropriate return metrics and investment metrics. We start out with a discussion on return metrics. 42 Data Mining and Market Intelligence Measuring returns with return metrics Essentially, a return metric is a variable that can be measured and used to quantify the desired end result of a marketing effort aimed at migrating a target audience from stage to stage in the sales cycle. If the purpose of a marketing effort is to move a target audience from being leads to becoming buyers, then return metrics measure and quantify buyers and their purchases. In this case, the number of buyers, the number of units sold, the total purchase amount, and the average purchase amount are examples of return metrics. Return metrics can be expressed in either financial or nonfinancial terms. Financial return metrics are metrics measured in dollars. Realized revenue is an example of financial return metrics. Nonfinancial return metrics are metrics that are not measured in dollars, but which nevertheless are important indicators of short-term marketing success. Examples of nonfinancial return metrics are incremental increase in awareness, number of responders to a marketing promotion, number of leads (potential buyers), and increase in customer satisfaction. In business-to-business situations, where the sales cycle is usually long, nonfinancial return metrics can be particularly important if the amount of investment in driving this type of short-term return is significant. Take as an example the advertising market. The value of the overall advertising market in 2006 in the US was $292 billion (Mediaweek 2006). Most of the advertising dollars are spent in driving the awareness or perception of a brand, product, and service rather than to generate revenue in the short run. If we only account for financial returns for advertising in the short run, then the short-term returns of advertising may be negligible. In reality, advertising can drive financial returns in the long run. Another reason for measuring short-term returns of marketing efforts is that marketing planning and execution sometimes have a shorter time frame than the sales cycle and are run and optimized ongoingly. As a result, the effectiveness of a campaign needs to be determined before potential customers make any purchases. Measuring investment with investment metrics Investment metrics refer to investment costs, such as marketing spending. These costs can be either fixed or variable. Fixed costs occur regardless of the implementation of a particular marketing campaign. Examples of such costs include the costs of existing marketing staff members and the upkeep of office facilities that are spread across marketing and other corporate functions, such as sales, operations, and information technology. Fixed costs are not allocated to specific marketing programs. One marketing program manager might be responsible for an e-mail program for product A and a direct mail program for product B simultaneously, and it Metrics Overview can be a challenge to estimate how much time this individual spends on a particular marketing program. Other examples of fixed costs are building lease cost and information technology support costs. Variable costs are costs that can be accurately attributed to specific marketing programs. Such costs include media costs, agency costs, production costs, and postage for a specific marketing program. One frequently asked question is which cost category should be included when accounting for marketing investment costs. If fixed costs can be attributed to specific marketing efforts without making assumptions that will lead to significant errors in cost allocation, then the correct thing to do is to consider both fixed and variable costs in estimating marketing investment. However, in situations where fixed costs cannot be accurately attributed to specific marketing efforts, the correct thing to do is to steer away from fixed costs and consider variable costs only. If fixed and variable costs are both imputed, then they must be applied to all marketing programs. If only variable cost is used, then variable cost must be applied to all marketing efforts that the firm undertakes. Consistency is an important consideration when deciding on the application of fixed and variable costs. ■ Developing a formula for return on investment Return on investment (ROI) is defined as the following ratio. ROI Return Investment Investment (3.1) The numerator is the difference between return and investment, and the denominator is investment. Investment is therefore given by Investment Costs of Goods Sold Costs of Capital Marketing I nvestment Additional Investment (3.2) In practice, in addition to immediately realized revenue amounts, usually there are potential future revenue streams. The best way to accurately quantify this type of financial returns is to compute the sum of the net present values of these streams of revenue. This is called the lifetime value (LTV) of a campaign and should be factored in as the total return of a marketing campaign whenever possible. 43 44 Data Mining and Market Intelligence in LTV NPV(sum of future revenue streams) ∑ i0 Ri (1 di )i (3.3) where Ri is the revenue at the end of time period i, di is the discount rate at time period i, and n is the total number of time periods in a lifetime. Caution needs be exercised when accounting for LTV for multiple marketing campaigns. If several marketing campaigns target the same audience during the same time period and they all account for the LTV of the same audience as returns, then we have a double counting issue. In these circumstances, it is best to report the ROI at an aggregate level across the multiple campaigns by aggregating both the incremental return and the incremental marketing investment. The following formula incorporates the LTV concept into the formulation of a marketing effort profit (Leeflang, Wittink, Wedel, and Naert 2000). Note that we also need to compute the NPV of LTV for costs of goods sold (COG), cost of capital, marketing expense, and additional marketing expenses. in Profit ∑ i0 in in in Ri Di Mi Ai ∑ ∑ ∑ i i i (1 di ) ( 1 ) ( 1 ) ( 1 d d di )i i i i0 i0 i0 (3.4) where Ci is the cost of goods sold at time period i, Di is the cost of goods sold at time period i, Mi is the marketing expenses at time period i, Ai is the additional expenses at time period i, and di is the discounted rate at time period i. ■ Common ROI tracking challenges It is a well-known fact in the marketing community that ROI is crucial but often difficult to track. The following are some common reasons behind the challenge of tracking ROI. ● ● ● ● ● ● ● ● No clearly defined business objectives and corresponding metrics Confusion over what a ‘true’ return is Confusion over true return metrics versus other variables such as operational metrics No access to prospect, customer, or sales data No system or process for integrating multiple data sources Information overflow: too much data and too little insight or intelligence Data quality issues: data is not clean and reliable Unable to quantify marketing contribution to a sale transaction Metrics Overview ● ● ● ● ● Unable to attribute sales to a particular channel such as offline sales due to online marketing Long sales cycle hindering proper control of environmental factors and effective tracking No cost efficiency threshold to determine when to continue or stop spending No prioritization of metrics in terms of their importance Inability to quantify marketing impact on the company bottom line: marketing often perceived as a cost center. The list of challenges can easily be extended. On a positive note, there is growing interest and determination in the marketing community to overcome these challenges wherever they occur. Through this book, we will review means and tools that can help address some of the challenges listed above. ■ Process for identifying appropriate metrics Figure 3-1 shows as step-by-step approach for identifying ROI metrics. This approach ensures that the selected metrics are well aligned with the business objectives and are able to track returns on investments effectively. Step 1 Step 2 Identification of the overall business objective Understanding the impact of a marketing effort on target audience migration Step 3 Selection of appropriate marketing communication channels Step 4 Step 5 Identification of appropriate return metrics by stage in the sales cycle Construction of ROI metrics with return metrics and investment cost Figure 3-1 Step-by-step process for metrics identification. Identification of the overall business objective A business objective is a desired outcome of a marketing effort. The following is a list of common business objectives. 45 46 Data Mining and Market Intelligence ● ● ● ● ● ● ● ● ● ● Increasing brand or product awareness in the target audience Educating the target audience Generating interest in particular products or services Generating leads Acquiring new customers Minimizing customer attrition or increasing customer loyalty Increasing revenue from existing customers by selling them additional products (cross-sell or up-sell) Increasing profitability Increasing customer satisfaction, renewals, or referrals Increasing market share and penetration. Correct identification of the business objectives is crucial to the selection of appropriate metrics to effectively measure the success of marketing investment. A very common mistake in marketing is the misalignment between business objectives and metrics tracked. For instance, it often happens that the number of leads is tracked in the context of a brand awareness program. Brand awareness programs are meant to increase the awareness level among the audience, not to generate leads. Leads may be generated as a by-product of a brand program, but should not be used as the sole metric to judge the success of the program. Understanding the impact of a marketing effort on target audience migration After the business objectives have been determined, we must identify the target audience and where it is in the sales cycle. In general, there are five stages in a sales cycle (or sales funnel). They are awareness, interest and relevance, consideration, purchase, and loyalty and referral (Figure 3-2). The awareness stage At this stage, prospects are exposed to information about companies, products, or services. This information could be a review of what they already know or completely new information. At this stage, we don’t expect prospects to immediately make purchases. However, their understanding of a company, a product, or service deepens at this stage and their likelihood of making a purchase later increases. There are different types of awareness, such as awareness of a brand, product, or service. We will use the Prozac.com website as an example, illustrated in Figure 3-3. The website provides information about depression as a disease and Prozac as a medicine. The ‘Disease Information’ section serves as a highlevel education source for depression as a disease. The ‘How PROZAC Metrics Overview Sales cycle Audience Awareness Prospects Website visitors Inquirers Responders Interest and relevance Leads Consideration Purchase Customers High-value customers Satisfied customers Advocates Loyalty and referral Figure 3-2 A five-stage sales cycle. Home How Can Prozac Help Disease Information Next Steps Welcome to Prozac.com Prozac is the most widely prescribed antidepressant medication in history. Since its introduction in 1986, Prozac has helped over 54 million patients worldwide, including thouse suffering from depression, obsessive compulsive disorder, bulimia nervosa, and panic disorder. Figure 3-3 Depression and Prozac awareness (Source: Prozac.com Website 2006). Can Help’ section gives an overview of Prozac and how it can help depression patients. Both sections help raise awareness about depression and about Prozac as one of the drugs for mitigating depression. These two sections by themselves may not get readers to purchase Prozac right away, however. What is expected is that once depression patients build enough 47 48 Data Mining and Market Intelligence awareness and knowledge about the disease and the drug, they will be interested in making an inquiry at their doctor ’s offices or through other channels. Our primary objective at the awareness stage is to accomplish an incremental degree of brand perception by the target audience. Increase in awareness, usually measured through survey studies, is a common return metric at the awareness stage. The interest and relevance stage At this stage, prospects may exhibit interest after their awareness (of a brand, product, or service) has been brought to a certain level. They may feel that the product or service is relevant to their needs and preferences and they may respond by requesting more information or filling out a survey. We review the Prozac.com example again. Some web visitors will proceed to the ‘Next Steps’ section once they build enough awareness and interest in Prozac. There are five suggested actions under ‘Next Steps’. They are ‘Asking your Doctor ’, ‘Balanced Living’, ‘Support Organizations’, ‘Caring for Others’, and ‘Request More Information’. While ‘Balanced Living’, ‘Support Organizations’, and ‘Care for Others’ are sections intended to further educate visitors, the other two sections, ‘Asking you Doctor ’ and ‘Request More Information’ require some sort of ‘action’ on a visitor ’s part. When a visitor takes an action, it means that he has passed the awareness stage and has moved on to the ‘Interest and Relevance’ stage. Any metric that quantifies interest, such as number of responders, is a suitable return metric for this stage. The consideration stage At this stage, customers or prospects exhibit sufficient interest to consider a purchase. They are willing to engage with sales or customer service teams in a dialogue about their needs and potential purchases. In the consideration stage, the audience is willing to invest more time in interactions with marketers than in the interest and relevance stage. In the Prozac.com example, sales leads can be generated through different scenarios. The most common scenario is through a doctor ’s prescription since usually a doctor will prescribe a drug that suits a patient’s physical and mental conditions. A patient that has gone through the awareness stage and the interest and relevance stage will likely discuss the potential use of Prozac with his doctor. Any metric that quantifies consideration, such as the number of qualified leads, are appropriate return metric for this stage. Metrics Overview The purchase stage By the time a prospect reaches this stage, he has got a clear need and has gathered sufficient information about a certain product or service. These prospects are likely to have gone through a test trial with the product and are getting close to making an outright purchase. They are convinced that the product or service can address their needs and that they can afford it. In the Prozac example, after a patient has gone through the awareness stage, the interest and relevance stage, and the consideration stage, he is ready to make a purchase. The action of a purchase characterizes the purchase stage. At the purchase stage, we need to quantify buyers and purchases among the target audience. Common return metrics at this stage include number of buyers, number of transactions, total sales, and average sales per transaction. The loyalty and referral stage In this stage, customers are very satisfied with a product or service and can be viewed as loyal customers. They begin to spread positive ‘word of mouth’ (WOM) and actively refer others to the product or service. The importance of WOM cannot be overemphasized. WOM is particularly effective when large transactions and large investments on the part of the purchaser are involved. Customers at this stage are the most loyal ones and should be treated with extreme care. In fact, there is a new discipline of marketing called social marketing that capitalizes on customer referrals. Examples of return metrics at this stage are customer satisfaction level, customer tenure, total historical purchase amount, the number of repeat purchases, the number of referrals, and revenue generated as a result of referrals. So far, we have introduced the process of metrics identification, we described the typical sales cycle and the target audience engaged at each stage of this sales cycle. Now we focus on marketing communication channels that are best suited for each stage of the sales cycle. To some extent, selection of marketing communication channels drives metric selection. Selection of appropriate marketing communication channels After identifying where to migrate the audience within the sales cycle, we need to leverage the most effective marketing communication channels to accomplish the migration. This section is designed to provide an overview on how various marketing communications channels are utilized. 49 50 Data Mining and Market Intelligence Marketing communication channels are classified as online or offline channels. Table 3-1 classifies the commonly used marketing communication channels by their online or offline nature. Table 3-1 Key marketing communication channels (offline vs. online.) Online (Internet) Offline Banner Search TV Print (e.g., newspaper/FSI, magazine) Radio billboard Physical store Direct mail (e.g., catalog, postcard, letter, newsletter) Telemarketing Trade show Seminar Online community Website E-mail (including electronic newsletter) Webinar Another way to classify marketing communication channels is by whether they are mainly used for broad reach or targeting. Table 3-2 shows a classification of key marketing communication channels in this manner. Table 3-2 Key marketing communication channels (broad reach vs. targeting.) Broad reach Targeting Banner Direct mail (e.g., catalog, postcard, letter, newsletter) Telemarketing Trade show Seminar E-mail (including electronic newsletter) Webinar Search Online community Website TV Print (e.g., newspaper/FSI, magazine) Radio Billboard Physical store As a general rule of thumb, we use broad reach channels to communicate to a broad audience in the initial stages of the sales cycle, such as Metrics Overview awareness, and apply for targeting channels to target specific individuals in later stages. Table 3-3 shows how marketing channels are often used across the five stages in the sales cycle. Please note that some marketing communication channels may be used for both broad reach or targeting. Table 3-3 Marketing communication channels by stage in sales cycle Awareness Interest and relevance Consideration Purchase Loyalty and referral TV Print Radio Billboard Direct mail TV Print Radio Billboard Direct mail E-mail Online banner Search Website TV Print Radio Billboard Direct mail Direct mail E-mail Search Website Telemarketing Direct mail E-mail Search Website Telemarketing E-mail Online banner Physical store Physical store Community E-mail Online banner Search Website Search Website Telemarketing Physical store In what follows we give a summary overview of market communication channels. Broadcast channels TV, radio, billboards, newspaper, and magazines are often used to reach a broad audience to generate awareness. However, there are ways to target a more specific type of audience. In the case of newspapers and magazines, those who read the Wall Street Journal have a different demographic profile than those who read other papers. The Wall Street Journal caters to affluent professionals. Another example is the ‘Parent’ magazine, which appeals to an audience with children. Online advertising This group of media caters to either a broad or a more targeted audience. For example, the home page of Yahoo.com attracts a broad audience while the various sections of the site attract more targeted audiences. Visitors to the Yahoo Finance site tend to be more interested in finance and investment, while those visiting Yahoo Travel are interested in travel. Next we provide an overview of online advertising (advertorial.org 2006 and iab.com 2008). 51 52 Data Mining and Market Intelligence The content of online advertising can be text, standard graphs (GIF, flash), or rich media. A rich media ad is a web ad that uses advanced technologies such as a streaming video and an applet. Online ads also have many different formats in terms of style and size. Sponsored text links are one of the latest trends in online advertising. Although less flashy than rich media, text links are often perceived as content rather than advertising. The word ‘advertorial’ is a combination of two words, advertising and editorial. It refers to an advertisement written in the form of an editorial to give an appearance of objectivity. A full banner (468 60) is the classic format (468 pixel in width and 60 pixel in height) and is usually residing at the top of a web page. Even though newer, smaller formats are being utilized, this banner format is still delivering some of the best results. The sheer size of this format gives it the ability to attract more attention. In 2001, a group of new size formats were introduced to allow for a more flexible integration of online ads into web content. There are four common rectangular formats and one common square format. ● ● ● ● ● Rectangle: 180 150 pixels Medium Rectangle: 300 250 pixels Large Rectangle: 336 280 pixels Vertical Rectangle: 240 400 pixels Square: 250 250 pixels A leaderboard is a popular format that was originally used in sports. A leaderboard usually sits between the title area of a web page and its content. The standard size for a leaderboard is 728 90 pixels and can consist of text and animation. Some newer formats have been developed to utilize the extra space of a web page or to make web pages more exciting. These newer formats include skyscraper, interstitial, superstitial, floating ad, pop-up, popunder, pop-up, and rollover. A skyscraper is an economic way of using web space. Contrary to the traditional banners that use horizontal space, a skyscraper uses vertical space. The standard formats of a skyscraper are 120 600 pixels and 160 600 pixels and the latter is called a wide skyscraper. An interstitial ad is an ad that appears on a website when a visitor clicks a link to a content page on a site. The visitor will first see the interstitial ad before seeing the requested page. Interstitial ads need to be used very carefully as visitors may find this type of ads intrusive. A superstitial is an interactive (and sometimes entertaining) online ad with a flexible size. The first superstitial was designed for the Superbowl event in 2000. Superstitials can have animation, sound, and even video elements. It has the look and feel of a television ad. Like interstitial Metrics Overview ads, superstitial ads are activated when a visitor goes from one page to another. A floating ad is an online advertising format that is superimposed on web page that a visitor requests. It is usually triggered either when rolling over an ad or when the content page loads. It usually disappears automatically after 5–30 seconds. A pop-up is a small window that is initiated by a single or a double mouse click. The small window usually sits on one area of the web page that a visitor is viewing. A visitor can get rid of pop-up by closing the small window. A pop-up is often considered intrusive and should be used with caution. There are many pop-up blockers on the market now that block activation of pop-up ads. The pop-under ad is one of the latest innovations in online advertising. Unlike pop-ups that block part of the content on a web page, a pop-under is small window that appears under the main window of a site. In general, pop-under ads are considered less intrusive than pop-up ads. A rollover ad is an interesting format that allows marketers to maximize use of the web real estate. Graphs or messages are displayed on the same banner whenever the visitors rest the mouse for a moment over the banner, or when they click on the banner. Search engine marketing Search engine marketing (SEM) as a marketing communication channel has gained significant traction lately. SEM has shown measurable impact on marketing in generating responders, leads, and buyers. Those who search for certain key words are actively seeking information on a particular subject matter and therefore are already in the interest and relevance stage of the sales cycle. There are two types of listings: natural listings (same as organic or editorial listings) and paid listings. Natural listings are free. Search engines such as Yahoo and Google send out ‘crawlers’ or ‘robots’ to comb various websites and pages over the Internet and record relevant web pages in an index. If the content of a web page is relevant to a particular topic, then the web page will be indexed under that topic. When someone searches for that topic, this particular web page will be displayed, along with other web pages, as the search results return. In contrast, paid listing requires advertisers to pay a fee to search engine companies. Search engine marketing allows an advertiser to promote his product or service by displaying the product or service description (usually called an ad copy) and a link as part of the search result listings. Advertisers bid on key words for their products or services to appear in prominent positions in search results. Copies of their advertising on product and services are listed based on their ranking in the bid. 53 54 Data Mining and Market Intelligence The higher an advertiser is willing to bid on a keyword, the better his ranking is likely to be. A top ranking ensures that an ad copy will appear at the top of a paid listing section. Advertisers are charged only when someone clicks on the paid listing and they pay by the number of clicks. Advertisers do not pay if there is no click on their listings. In the example illustrated in Figures 3-4 where a search for ‘digital camera’ is submitted, the listings appearing under ‘Sponsored Links’ on the top and on the right hand are paid listings. In this example, Dell has the top one position for the key word ‘digital camera’ in the sponsored links section on the top of the web page. Listings that are not under ‘Sponsor Links’ are natural listings. Figure 3-4 Search results for ‘digital camera’. Corporate websites Corporate websites have become an important marketing communication channel. A well-designed and implemented company website can serve multiple purposes across the sales cycle. In the awareness stage, Metrics Overview a company’s website can be used to build brand awareness among its visitors and educate them on the company products and services. In the interest and relevance stage, the site can be used as a venue for visitors to register for more information and opt-in for newsletters. Onsite search functions provide visitors with additional convenience in locating the products that they are interested in and therefore moves them further into the sales cycle. In the consideration stage, web registrants and opt-in members can be furthered screened for qualified leads and potential buyers. In the purchase stage, the site can serve as an e-commerce marketplace where buyers purchase directly from the marketer. Websites with a look-up feature for account history offer additional convenience for buyers to order the same products repeatedly or to review their past purchases and such a feature can further convert buyers to repeat buyers. In the referral stage, a website can offer blogs, online communities, forums, or chat rooms to solicit feedback and stimulate WOM marketing. There is an emerging trend of turning a company website into an activity ‘hub’ that serves customers wherever they are in the sales cycle. According to a 2006 report (Webtrends 2006), 56% of the CMOs surveyed were using or planning to use, within one year, their company websites as a hub of marketing strategy for building relationships. Direct mail Direct mail is one of the first media designed for one-to-one direct marketing. Direct mail addresses particular individuals or organizations. Direct mail format varies from postcard, letter, to catalog. Usually there is a clear call to action for direct mail recipients. For example, a business reply card (BRC) may be enclosed for a recipient to fill out and return via a business reply envelop (BRE). Sometimes, a web URL or a toll-free number is given in direct mail for a recipient to visit a website or make an inquiry call. Those who fill out a BRC, visit a website, or call a toll-free number are called responders. E-mail Like direct e-mail, e-mail is an excellent medium for one-to-one marketing, soliciting responses and generating leads. Compared with direct mail, e-mail is less formal but can be delivered faster and more cheaply. Newsletters Newsletters can be used in multiple stages in the sales cycle. They can be used to educate prospects and raise their awareness of a product or to generate responses, leads, or sales. Furthermore, newsletters can also be used to cross-sell or up-sell additional products to existing customers. In 55 56 Data Mining and Market Intelligence terms of format, newsletters can be in either print or electronic form. Print newsletters addressed to specific individuals or organizations are a form of direct mail. Electronic newsletters addressed to specific individuals or organizations are usually in e-mail format. Telemarketing Telemarketing is another medium for one-to-one marketing beyond the awareness stage. This medium is usually more expensive than direct mail or e-mail. However, telemarketing allows human interaction and intervention in the process while e-mail and direct mail cannot. As a result, telemarketing can be more effective and is widely used as a lead qualification and follow-up tool. Physical stores Physical stores are currently where most sales transactions take place and this is particularly true of high-value and high-consideration products. Stores serve audiences across the various stages in a sales funnel and are places where prospects, potential leads, and existing customers congregate. Tradeshows and seminars Tradeshows and seminars are two marketing media mainly used for lead generation purposes. Occasionally, marketers use tradeshows and seminars to educate prospects on complicated products or services as an awareness generation mechanism. However, the use of tradeshows and seminars for awareness generation is usually expensive. Webinars Webinars are electronic seminars that have recently gained in popularity. Webinars are cost-effective and can serve multiple purposes in the sales cycle. They can be used to educate prospects, solicit responses, or generate leads. Webinars also allow for real-time interaction and questions and answers (Q&A) and as a result, makes the engagement process more interesting and effective. Identification of appropriate return metrics by stage in the sales cycle There are five groups of return metrics, in alignment with the five stages of the sales cycle. Metrics Overview Return metrics at the awareness stage This group of return metrics measures awareness of brands, products, or services. Some of these metrics, such as number of recalls, are direct measures of awareness. Indirect (proxy) measures of awareness, such as number of impressions or reaches, are used under the assumption that they will ultimately lead to awareness buildup. Under some situations, it is difficult or expensive to directly measure the level of awareness, so indirect measures are used as an alternative to direct measures. For corporations with good name recognition, such as Cisco, Google, Microsoft, and Coca-Cola, brand equity value can be one additional measure for brand awareness. Brand equity value is the monetary value of a brand. Table 3-4 shows a list of the top twenty corporate brands. Table 3-5 shows common return metrics at the awareness stage. Table 3-4 Top twenty corporate brands in 2005 (source: Businessweek 2005) Ranking Brand Equity value (in million) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Coca-Cola Microsoft IBM GE Intel Nokia Disney McDonald’s Toyota Marlboro Mercedes Citi Hewlett-Packard American Express Gillette BMW Cisco Louis Vuitton Honda Samsung $67,525 $59,941 $53,376 $46,996 $35,588 $26,452 $26,441 $26,014 $24,837 $21,189 $20,006 $19,967 $18,866 $18,559 $17,534 $17,126 $16,592 $16,077 $15,788 $14,956 Return metrics at the interest and relevance stage Genuine interest from prospects emerges when their awareness level reaches a critical point. The target audience at this stage is engaged 57 58 Data Mining and Market Intelligence Table 3-5 Common return metrics at the awareness stage Direct return metrics at awareness stage Proxy return metrics at awareness stage Number of recalls Number of media mentions Awareness increase measured by survey Number of impressions Number of reaches Brand value/equity and responsive to marketing solicitations, searching for a specific topic, requesting information, or responding to marketing campaigns. Return metrics at the interest and relevance stage measure the extent to which the audience is engaged or responsive to marketing stimulation. Here is a list of common return metrics at the interest and relevance stage. ● ● ● ● ● ● ● Number of website visitors Number of unique website visitors Number of new website visitors Number of repeat website visitors Number of website page views Number of clicks on a particular link Number of responses to website offers. In defining these metrics, care must be taken that the quantities used properly reflect the dimensions involved in the marketing program. For instance, the number of website visitors may refer to a particular location or a specific period of time. The next example shows the use of website visitor data to measure return of a website at the interest and relevance stage. Company A launched a website featuring its new product at the end of Sep. 2005. The main purpose of the site was to increase interest in the new products. The company tracked the number of unique visitors as the return metric with a web analytic tool. Table 3-6 shows a steady growth in the number of unique visitors to the site. Return metrics at the consideration stage This group of metrics measures the effectiveness of marketing programs in generating leads. Leads are defined as those with have sufficient awareness and interest in a particular product or service to contemplate making a purchase. The number of leads is a common return metric at this stage. Metrics Overview Table 3-6 Unique visitors to company A’s website Number of unique visitors in October 2005 Number of unique visitors in November 2005 Number of unique visitors in December 2005 85,006 110,193 134,500 Return metrics at the purchase stage This group of metrics includes metrics such as the number of transactions, sales revenue, and average purchase amount per transaction (a.k.a. AOV, average order value). Here are some common return metrics at the purchase stage. ● ● ● ● Number of transactions Number of buyers Total revenue Average value per transaction. The next example illustrates the return metrics at the purchase stage. A cell phone company A launched an e-mail program to target 50,000 existing customers to persuade them to renew their phone service plans at an annual subscription fee of $600. The e-mails had a link to a website where customers could renew their subscriptions online. Five hundred of those targeted by the e-mail renewed their phone plans online, resulting in direct sales revenue of $300,000. The investment cost of the program was $40,000. The net profit was $260,000. Company A achieved $300,000 direct sales return with an investment of $40,000. We summarize all the above statistics in Table 3-7. Table 3-7 Return metrics of company A’s email program Return metrics at purchase stage Value Number of transactions Number of buyers Total revenue Average value per transaction 500 500 $300,000 $600 Company A was not able to capture offline sales as a result of the e-mail program. Offline sales occurred in situations where customers received the e-mail but decided to call to renew instead of doing so online. 59 60 Data Mining and Market Intelligence Return metrics at the loyalty and referral stage This group of metrics measures the depth of the relationship between a marketer and its customers. Some examples of this type of metrics are customer retention rate, length of customer tenure, the number of purchases per year, and customer lifetime value. Loyal customers can refer people to the brands that they are familiar with and as a result, generate potential future business for those brands. Given the popularity of blogs and other types of online communities, loyalty and referral metrics are expected to exert increasing influence on purchase behavior. Here is a list of common return metrics at the loyalty and referral stage. ● ● ● ● ● ● ● Life time value (LTV) Purchase frequency Tenure (length of time since becoming a customer) Number of referrals Revenue due to referrals Customer testimonials Customer satisfaction. There are other metrics, called proxy metrics, that measure loyalty or referral indirectly. Customer satisfaction is an example of a proxy metric used under the assumption that satisfied customers are more loyal. Customer satisfaction is a very important metric in major corporations, and is often used as one of the criteria for measuring marketing executives’ performance and compensation. The most common method for acquiring customer satisfaction data is by survey analysis. Next we discuss an example illustrating customer satisfaction survey analysis. Company A, a consumer electronics manufacturer, recently revamped its website to enhance visitor experience. New features of the site include a store locator page with information on the nearest store locations and store phone numbers, a shopping cart for online direct purchases, and pages of promotion offers. The company ran an online survey prior and after the new site launch to gauge how customer satisfaction may have changed due to the new site features. Satisfaction was measured from a rating of one, not satisfied at all, to nine, extremely satisfied. The survey result shows that the average visitor satisfaction increased from 5.2 to 6.4, a 23% improvement with an 80% statistical confidence level. One frequently asked question is how one can tie customer satisfaction improvement to incremental revenue. Surveys are usually conducted in an anonymous fashion and their results cannot be easily mapped to the revenue-generating customer base. We need to measure custom satisfaction and revenue consistently in the same audience and apply data Metrics Overview mining analysis to determine the level of correlation between customer satisfaction and revenue. ■ Differentiating return metrics from operational metrics So far, we have identified an appropriate ROI formula and the key return metrics across the five stages in the sales cycle. However, it is still a common challenge to distill the enormous amount of available marketing data. This challenge arises primarily from the difficulty in differentiating return metrics from operational metrics. The key difference between these two types of metrics is that the former indicates an end result while the latter focuses on a process. Operational metrics track the footprints of an audience as they migrate from one stage to the next or within the same stage in a sales cycle. The majority of the metrics are in fact operational metrics. To better illustrate the difference between return and operational metrics, we will go through some exercises to identify appropriate return and operation metrics in the sales cycle. If the desired end result is to move the audience from the awareness stage to the interest and relevance stage and the marketing communication channel is an online banner for generating clicks on ads, then the return metric should be the number of clicks. The operational metrics are those metrics that measure how effectively impressions turn into clicks and the click-through rate is an example of operational metrics. It is a common mistake to treat the click-through rate as the return metric. If the desired end result is to move the audience from the interest and relevance stage to the consideration stage, and the marketing communication channel is direct mail for generating leads, then the appropriate return metric is the number of leads. The operational metrics are those that measure how effectively responses turn into leads, such as the response to lead conversion rate. If the desired end result is to move the audience from the consideration stage to the purchase stage and the marketing communication channel is outbound phone followup for generating sales, then the appropriate return metrics are the number of buyers, revenue amount, and profit. The operational metrics are those metrics that measure how effectively leads turn into buyers, such as the lead to buyer conversion rate. A best practice on campaign reporting is to clearly show the distinction between return metrics and operational metrics. In a campaign performance report, return metrics need to be placed in more prominent positions than operational metrics. 61 62 Data Mining and Market Intelligence ■ References Leeflang, P.S.H., D.R. Wittink, M. Wedel, and P.A. Naert. Building Models for Marketing Decisions. Kluver, Massachusetts, 2000. Marketer ’s Guide to Media. Mediaweek, New York, 2006. The advertorial.org website. Montreal, Canada (http://www.advertoroal.org). Webtrends CMO Web-Smart Report. Webtrends, Portland, OR, 2006. The Interactive Advertising Bureau website (http://www.iab.net), New York, 2008. CHAPTER 4 Multichannel Campaign Performance Reporting and Optimization This page intentionally left blank In this chapter, we focus on the tracking and analysis of marketing returns from multi-channel campaigns. Marketers often use more than one single communication channel in a campaign due to the fact that different communication channels tend to appeal to different segments within a target audience. In the high-tech business-to-business market place, for instance, marketers often use direct mail to target business decision-makers (BDM) and online channels to target technical decision-makers (TDM). ■ Multi-channel campaign performance reporting It is a constant challenge to report campaign performance on multiple communication channels. The challenge is augmented by lack of clarity about the different roles that different metrics play. Return metrics can be the same across different channels but operational metrics are often channel-specific. Figure 4-1 illustrates a systematic thought process for identifying appropriate metrics for multi-channel reporting. Step 1 Step 2 Step 3 Step 4 Step 5 Identify all marketing communication channels and their associated cost and target volume Identify the overall return or success metrics. Aggregate and track these metrics across channels Select marketing channel specific return or success metrics Identify operational metrics by marketing channel Uncover operational metrics highly associated with channel return Figure 4-1 Metrics identification process for multi-channel campaign reporting. The first step is to identify all the marketing communication channels in the campaign and to track their associated cost and target volume. Consider the following example: Company A launched a multi-channel campaign with the objective of generating leads to move its target audience to the consideration stage of the sales cycle. The campaign consisted of four channels: direct mail, e-mail, online banners on external web sites, and paid search. The cost and the target volume of the four channels are detailed in Table 4-1. 66 Data Mining and Market Intelligence Table 4-1 Cost and target volume of a multi-channel campaign of company A Market communication channel Total marketing cost (agency and media costs) Target volume Direct mail E-mail Online banners Paid search Total $500,000 $300,000 $250,000 $750,000 $1,800,000 1,000,000 pieces mailed 400,000 e-mails delivered 125,000,000 impressions 500,000 clicks – The second step is to identify the overall return or success metrics of the campaign, to aggregate these metrics across all marketing communication channels, and to track them. In the example of company A, the overall return or success metrics is the number of leads generated by the campaign. However, some channels might have purposes beyond driving the number of leads. For instance, online banners are used to raise awareness as well as to generate leads. We can roll up the number of the leads from online banners and other channels to derive the total number of leads for the campaign, but it is important to remember the additional purpose of each channel. At this stage, it is important to calculate the overall returns and returns by channel. Channels with a higher returns should be invested in more heavily. When the returns is measured in nonfinancial terms such as number of leads, then the cost per lead is the metric to optimize. Table 4-2 shows the key return or success metrics for the company A campaign. Table 4-2 The rollup of return metrics Marketing communication channel Total marketing cost (agency and media costs) Leads Cost per lead Direct mail E-mail Online banner Paid search Total $500,000 $300,000 $250,000 $750,000 $1,800,000 5000 9000 1250 10,000 25,250 $100 $33 $200 $75 $71 The third step in selecting appropriate metrics is to identify the channelspecific return or success metrics. These channel-specific return or success Campaign Performance Reporting and Optimization metrics can be viewed as mini goals or intermediate return or success metrics. In the example of company A, the ultimate return or success metric of the campaign is the number of leads. Within the online banner channel, the number of responses and the number of clicks can be considered as intermediate return or success metrics for this particular channel. The fourth step is to identify all potential operational metrics by marketing communication channel. Operational metrics are usually channel-specific and may not be rolled up across channels. For example, click-through rate is an operational metric that only applies to online channels and cannot be applied to direct mail. Since there may be hundreds of operational metrics, it is important to take the additional step of identifying those with the highest impact on the returns of the channel. The fifth and last step in the identification process is to uncover operational metrics highly associated with channel returns. This is where data mining is extremely useful and should be fully leveraged. Appropriate adjustments in the values of operational metrics within each channel can maximize the overall returns of the campaign. For example, in the case of online banners, the number of impressions, click-through rate, and response rate are potential operational metrics. By improving any of these three operational metrics, company A can increase the number of leads generated by its online banners and therefore increase the number of leads generated by the overall campaign. Table 4-3 through Table 4-6 show the common operational metrics for direct mail, e-mail, online banner, and search marketing from the awareness stage to the conversion stage of the sales cycle. Table 4-3 Common operational metrics for direct mail from the awareness stage to conversion stage Marketing communication channel From awareness to interest and relevance From interest and relevance to conversion Direct mail Response rate(responses/ mail quantity) Lead conversion rate (leads/responses) ■ Multi-channel campaign performance optimization The purpose of campaign performance optimization is to maximize the cost-effectiveness level of a return metric by leveraging what can be 67 68 Data Mining and Market Intelligence Table 4-4 Common operational metrics for e-mail from the awareness stage to conversion stage Marketing communication channel From awareness to interest and relevance From interest and relevance to conversion E-mail Open rate(e-mails opened/ e-mails delivered) Lead conversion rate(leads/responses) Click-through rate(clicks on links in e-mails/e-mails delivered) Response rate(responses to an offer/e-mails delivered) Table 4-5 Common operational metrics for online banner from the awareness stage to conversion stage Marketing communication channel From awareness to interest and relevance From interest and relevance to conversion Online banner Click-through rate(clicks on banner/ impressions) Lead conversion rate(leads/responses) Response rate (responses to an offer/ clicks on banner) learned from the past. Campaign optimization can be viewed as an extension of campaign reporting. Additional steps to optimize future campaigns need to be taken after the five-step campaign performance reporting process introduced in the previous section is completed, as shown in Figure 4-2. The first step is to identify the metrics to optimize. Optimization metrics should be aligned with the overall campaign return or success metrics. The rationale is apparent: performance optimization aims to cost effectively increase the value of the return or success metrics. In the example of company A, the overall return or success metric is the number of leads. Company A can keep spending money to generate more leads but at some point it will no longer be cost-effective to do so given the potential Campaign Performance Reporting and Optimization Table 4-6 Common operational metrics for paid search from the awareness stage to conversion stage Marketing communication channel From awareness to interest and relevance From interest and relevance to conversion Paid search Click-through rate(Clicks on Webpage links/impressions) Lead conversion rate(leads/responses) Response rate(Responses to an offer/clicks on Webpage links) Step 1 Step 2 Step 3 Step 4 Step 5 Identify metrics for optimization Determine optimization timeframe, frequency, and tool Identify key operational metrics with highest impact on the optimization metrics Identify factors that influence key operational metrics values Apply learning to future campaign planning and execution Figure 4-2 Campaign optimization process. diminishing returns. The solution is to maximize the number of leads with a cost per lead threshold in place. How can company A derive this cost efficiency threshold? One way is to derive the expected average profit per lead and make sure that the cost will never run above the expected profit. The second step is to determine the optimization time frame and the tool required for optimization. How frequently an optimization strategy is revisited depends on the marketing communication channel utilized. Sufficient time needs to pass before any conclusions about the strategy are drawn. The time frame and frequency of optimization should be close to the time frame required for a marketing communication channel to achieve its full result. For example, results on metrics such as clicks, click-through rate, and responses for banner, search, and e-mail can usually be measured close to real time with appropriate online analytic tools. In this case, campaign performance can be tracked in real time and 69 70 Data Mining and Market Intelligence optimized accordingly. Marketing dollars can be shifted from underperforming media sites to over-performing ones. In contrast, direct mail requires a much longer time period (one month or more) to generate final response results. In recent years, there has been significant advancement in analytic and optimization tools, including management and optimization tools for the web, ad serving, search, campaign management, lead and sales tracking, as well as customer relationship management (CRM). Before implementing full-scale deployment of any analytic and optimization tools, it is important to test the tool through a business proof of concept pilot with the tool vendors. The third step in campaign optimization is to analyze the data so far collected, and to identify key operational metrics based on their impact on the optimization metrics. In the example of company A, it is clear that the company needs to optimize the number of leads acquired cost effectively. To accomplish this, we need to identify the most important contributors to the number of leads. Response to lead conversion and number of responses are examples of these contributors. An increase in response to lead conversion rate at a given cost and a given number of responses will result in an increase in the number of leads. Alternatively, an increase in the number of responses at a given cost and a given response to lead conversion rate can result in an increase in the number of leads. Discovering these influential factors is crucial for optimizing future campaign performance. There are occasions when it is not obvious which factors are influential. When this is the case, we can use data mining techniques to uncover hidden relationships. Data mining techniques such as Classification and Regression Tree (CART) can be used to analyze the relationship between potential influential factors and the number of leads acquired at a given cost. Logistic regression is another data mining technique that can be used to build models to target those who are more likely to convert to leads. Chapter 7 discusses various data mining techniques that can be leveraged to uncover hidden relationships. The fourth step in campaign optimization is to identify attributes that can be manipulated to influence the values of key operational metrics. In the previous example of an online banner, marketing messaging is an attribute that may be changed to drive responses to lead conversion rate and hence the number of leads. Where can we find potential influential attributes? Data pertaining to any of the following areas can be candidates for such factors and the following is a list of such factors: ● ● Target-audience characteristics such as lifestyle and social economic status Stages in the sales cycle such as awareness stage and conversion stage Campaign Performance Reporting and Optimization ● ● ● Attributes of marketing communication such as creative and messaging Marketing and sales operations such as customer services and fulfillment Features of marketing campaigns such as rebates and discounts. In the fifth step, the learning from the current campaign should be applied to future campaigns to optimize marketing planning and execution. Ongoing tests and learning environments can lead to optimal marketing efforts and to sustained high returns on marketing investment. Uncovering revenue-driving factors Revenue is often the return or success metric that most marketing executives choose to focus on. Given the importance of revenue, we will go over some common practices on how to best identify revenue-driving factors. The key to understanding revenue-driving factors is to understand the target audience and where they are in the sales cycle. Customer segmentation is the tool for understanding the target audience. There are numerous ways of segmenting and profiling a customer base. The following are some common practices for uncovering revenue opportunities by segmentation. ● ● Segmentation of existing customers by value: The value of a customer is defined as the revenue generated by the customer over a period of time. Potential future revenue opportunities can be better identified as a result of this type of value segmentation. Marketing dollars need to be allocated to those customer segments with high growth potential to maximize revenue. In addition, cross-sell and up-sell can be leveraged to increase revenue. In Chapter 7, we will discuss a common cross-sell and up-sell analytic technique called association analysis (market basket analysis). Segmentation of the target audience by share of wallet: Share of wallet is defined as the total spending on a brand over the total spending on the category that the brand is under. Consider the supermarket business as an example. Customers who spend most of their grocery dollars with a particular supermarket brand (let us call it supermarket A) are the primary customers of this supermarket brand. In other words, supermarket A owns a large share of wallet of these customers. Those customers who spend most of their grocery dollars at competitors’ stores but also shop at this supermarket brand are its secondary customers with a small share of wallet. Supermarket A can increase its revenue by either increasing the share of wallet of its secondary customers, or increasing the purchase amounts of its primary customers. 71 72 Data Mining and Market Intelligence ● ● Segmentation of the target audience by likelihood to buy: We need to know where the various audiences are in the sales cycle. Do they barely know the brand and products? Do they know the products so well that they are ready to make purchases? Based on the insight gleaned from segmentation, different types of marketing programs can be created to target different subsegments in the audiences. For example, awareness programs are used to educate those at the awareness stage. Lead generation programs are used to target those who are ready to purchase. Data mining techniques such as logistic regression can be leveraged for targeting marketing. These techniques will be discussed in detail in Chapter 7. Segmentation of the target audience by needs: Marketing the right products to the right customers increases returns of marketing programs. Progams targeting specific audiences with specific products are more effective than generic programs. In summary, success in multiple-channel marketing campaigns requires consistent focus on the appropriate metrics and analysis of the interrelationships between these metrics. Reporting and optimization, two seemingly tactical areas of marketing, can often drive important marketing investment strategies. CHAPTER 5 Understanding the Market through Marketing Research This page intentionally left blank Marketing research is a powerful vehicle for uncovering and assessing market opportunities. In particular, it is an effective tool for addressing the following three sets of questions to ensure effective marketing investment planning. ● ● ● Where is the market opportunity? What is the size and growth rate of the opportunity? Who is the target audience? What are their profiles and characteristics? Why do consumers or businesses choose one product over another? Why do they choose one brand over another? In this chapter, we give an overview of marketing research and its applications to enhancing marketing returns. We start out with a synopsis on the application of marketing research to understanding the market and then will discuss marketing research as a discipline. ■ Market opportunities Understanding potential market opportunities is the first step in marketing investment planning. Solid knowledge of the market structure and market opportunities minimizes risk and increases returns on marketing investment. A market opportunity can be described by the following parameters. ● ● ● Market size Market growth Market share. Market size One way to describe and quantify a market opportunity is through market size information. Market size information can be segmented by attributes such as geography or industry. Syndicated research companies, such as IDC, Gartner, Forrester, Hitwise, Nielsen Media Research, Jupiter Research, and comScore, provide market size information for standard products. For nonstandard products or new products, customized research is required for gathering market size information. Customized research is usually more expensive than syndicated research. The following is an example of market size syndicated research and data. In the last few years, search marketing has grown rapidly as a marketing and advertising vehicle. Online search marketers are always interested in their key word rankings (positions) with key search engines. Based on a comScore report (comScore Networks, 2006), the total number 76 Data Mining and Market Intelligence of searches in the US grew from 4.95 billion in January 2005 to 5.48 billion in January 2006 at a growth rate of 10.7%. Table 5-1 shows that as of January 2006, Google had the largest market share (41.4%), followed by Yahoo (28.7%), MSN (13.7%), Time Warner Network (9.6%), and Ask Jeeves (5.6%). This market share information is important for determining where to invest to maximize exposure to potential customers. The search engines with the largest search market size are usually the most attractive marketing and advertising partners for search marketers. Table 5-1 Total Internet searches and share of online searches by search engine (Source: comScore 2006) Total Internet searches Share of searches by engine Google sites Yahoo! sites MSN–Microsoft sites Time Warner network Ask Jeeves Searches Jan. 2005 (billion) Searches Jan. 2006 (billion) 4.95 Jan. 2005 (%) 35.10 31.80 16.00 9.60 5.10 5.48 Jan. 2006 (%) 41.40 28.70 13.70 7.90 5.60 Market size terminology We must differentiate between total available market and total addressable market. Syndicated research companies often provide market size information on the total available market. Total addressable market is a subset of the total available market. Due to a variety of factors, companies may only have access to a subset of the total available market. This subset is called the addressable market. For example, a company with no infrastructure in Asia cannot sell into this geographic portion of the market. Therefore, for this particular company, the total addressable market is the total available market subtracted by the Asian market. Factors that impact market-opportunity dynamics Many factors can impact market opportunity and its growth. Understanding these factors allows for better marketing planning and more Understanding the Market through Marketing Research effective buy-in from marketing executives. The most important factors are macroeconomic trends, emerging technologies, and customer needs. Impact of macroeconomic factors on market opportunities A large number of elements within the macroeconomy affect market dynamics. We explore the most significant ones: Gross Domestic Product (GDP) growth, geopolitical factors, oil prices, exchange rates, interest rates, unemployment rates, product life cycle, and corporate profits. ● ● GDP growth: The growth of GDP not only is an indicator of market growth but also affects confidence in the market place and can drive subsequent growth. In other words, GDP growth is both a reflection and a potential driver of future market growth. According to the Bureau of Economic Analysis (2006, http://www.bea.gov/glossary/ glossary.cfm), the GDP of a country is the market value of goods and services produced by labor and property in that country, regardless of the nationality of the labor and of those who own the property. The Gross National Product (GNP) of a country is the market value of goods and services produced by labor and property supplied by the residents of that country, regardless of where they are located. GDP replaced GNP as the primary measure of US production in 1991. GDP is a composite measure based on various types of goods and services. Since GDP is a composite of growths in the various sectors of the economy, the growth of the larger economic sectors, such as manufacturing, financial services and government spending, tend to have more influence on the overall GDP growth. Consider the so-called nonresidential equipment and software investment sectors. Figure 5-1 shows that GDP was highly correlated with the nonresidential equipment and software investment sectors from Q1, 2005 to Q1, 2006. It is also noticeable that the nonresidential equipment and software investment sectors tend to have wider swings than GDP. This is likely due to the after-shock effects of GDP data releases, suggesting that a boost in GDP itself tends to boost confidence in the market place and thereby tends to indirectly boost subsequent investment in the two sectors. Political uncertainty: Political uncertainty, like economic uncertainty, tends to trigger or hold back business investments and consumer spending. The war in Iraq and the kidnappings of foreigners in the Middle East, for instance, have made investors think twice about their involvement in rebuilding the region. The threat of terrorism (including cyber terrorism) has boosted US government spending on defense and security since 2001, as illustrated in Figure 5-2. 77 Data Mining and Market Intelligence 16 GDP 14 Nonresidential equipment and software Growth (%) 12 10 8 6 4 2 0 2005:Q1 2005:Q2 2005:Q3 2005:Q4 2006:Q1 2006:Q1 Quarter Figure 5-1 GDP versus nonresidential equipment and software investment. Source: The Bureau of Economic Analysis, 2007. 12.0 GDP 10.0 Federal national defense investment 8.0 Growth (%) 78 6.0 4.0 2.0 0.0 2001 2002 Year Figure 5-2 US GDP versus federal national defense investment. Source: The Bureau of Economic Analysis, 2007. 2003 Understanding the Market through Marketing Research ● ● ● ● ● ● Oil prices: Oil prices usually fluctuate with the political climate in oilexporting areas such as the Middle East, Latin America, and Africa. The Organization of Petroleum Exporting Countries (OPEC) often adjusts its oil production level based on geopolitical factors. When oil prices go up, costs of production for goods go up and investment is scaled back. Exchange rate: The exchange rate has an effect on imports and exports, which in turn affect GDP growth. A weaker currency benefits exports and GDP if all the other factors are kept constant. A stronger currency inhibits exports and can result in increased inflationary pressures. Interest rate: Higher interest rates usually have a deterring effect on capital and consumer spending as borrowing costs increase. Unemployment rate: The employment rate is a reflection of corporate spending and hiring. An increase in unemployment is an indication of weak confidence in the market place and a decrement in the rate of business expansion. Product life cycle: Product life cycle is another factor that influences markets opportunities. When a product is approaching its end of lifetime, the market tends not to invest in this product and as a result its market size shrinks. Customers tend to avoid investing in a product about to become obsolete, and often prefer to wait for the next generation of the product, whose market size may then grow over time. Corporate profits: Increase in corporate profits usually has a positive impact on corporate spending in the long run, if not in the short run. Corporations are ready to spend more when executives feel comfortable with the state of business in their firms. Investment banks and research firms regularly survey CEOs, CFOs, and CIOs to gauge their feelings about the economic climate. All of the factors discussed above have either positive or negative impacts on market growth. Therefore, paying close attention to these factors is extremely important. A number of marketing research companies designate analysts and experts to analyze these factors on a regular basis and compile market size forecast based on these factors. Impact of emerging technologies and customer needs on the market Technology breakthroughs often have an impact on market growth, although the growth may be initially small as investors wait for full-scale adoption of the new technology. It is important to track emerging technologies or products that may replace existing technologies or products (and eventually eliminate current markets), supplement an existing technology or products (and thereby impact a current market either positively or negatively), or create 79 80 Data Mining and Market Intelligence completely new markets. For example, adoption of radio frequency identification (RFID) in the retail market has driven a demand for this new technology. At one point, Wal-Mart, the largest US retailer, requested some of its suppliers to become RFID-compliant by 2005. This created a sizeable market for RFID products and services. The creation of the credit card market, which is now a large financial market, is the result of customers’ demand for convenience. Diner ’s Club introduced the first credit card in the 1950s. American Express and Bank of America started issuing their cards in 1958. Over the years, the credit card became indispensable to most consumers and businesses, and as a result, a new and large financial market emerged. The size of credit card receivables in 2001 was over $600 billion in the US. Market growth trends Industry or technology analysts often express market growth for the following years (usually a total of five years) with a metric called compound annual growth rate (CAGR). The standard formula for computing CAGR is as follows: ⎛X CAGR ⎜⎜⎜ e ⎜⎝ X 1 ⎞⎟t1 ⎟⎟ 1 ⎟ b⎠ (5.1) where Xe is the market size forecast for time period t, Xb the market size forecast for time period 1, and t the number of years in the forecast time period. Market share Market share data indicates how well a company is positioned in a particular market. Those participants with the highest market shares are market leaders, and those with lowest market shares are market laggards. Low market share indicates an opportunity for growth. A firm with a large market share will find it harder to grow further and may seek or create another market. Market share can be expressed in terms of units sold or in dollar amount. The market share of a company during a given time period measured in dollar amount is Revenues of the company Revenues of the company Total revenu es of its competitors (5.2) Understanding the Market through Marketing Research Both the denominator and the numerator are in dollar amounts, and therefore market share is a dimensionless quantity. The market share of a company during a given time period measured in number of product units sold is Units sold by the company (5.3) Units sold by the company Total u nits sold by its competitors As before, the market share in this case is a dimensionless ratio. We can also compute the market share of a single product category by using only the revenue or units sold in that category. For example, for a consumer electronics manufacturer, its share in the camera category for a given time period can be computed by either of the two following expressions: Camera revenues of the company ny Total camera revenues Camera revenues of the compan of its competitors (5.4) Units of cameras the company sold Units of cameras the compaa ny sold Total units of cameras sold by its competitors (5.5) ■ Basis for market segmentation The ultimate goal of market segmentation is to create homogeneous segments where constituencies within each segment react uniformly to marketing stimuli. Market segmentation enables formulation of optimal marketing targeting strategies for each segment. The bases of segmentation for a particular product are market size, market growth rate, and market share. The first step in segmentation analysis is to identify the product of interest. In case of consumer banking industry, for instance, products and services can be broken down in segments such as checking, savings, credit card, line of credit, home equity, home mortgage, insurance, and brokerage. A bank may examine the market size of each product it offers or plans to offer and choose to focus on those products or services that have the largest market size, the highest growth rates, or the lowest market shares. We next consider a hypothetical case study on segmentation. For Company W, the total market size of product A, B, C, and D in 2007 81 Data Mining and Market Intelligence was $927m. In this case, market segmentation results in four distinct segments, illustrated in Figure 5-3. Low High Market size Low Growth Segment 2 Segment 1 Product: D Product: C Market size : $5 million Market size : $132 million Annual growth: 2% Annual growth: 10% Priority: Low Priority: High Segment 3 High 82 Segment 4 Product: B Product: A Market size : $525 million Market size : $265 million Annual growth: 5% Annual growth: 15% Priority: High Priority: High Figure 5-3 Market segmentation by market size and growth. ● ● ● ● Segment one: small size and low growth Segment two: small size and high growth Segment three: large size and low growth Segment four: medium size and high growth. Segments three and four represent the most attractive opportunities, followed by segment two. Segment one represents the least attractive opportunity. Market segmentation by market size, market growth, and market share: case study one So far, we have discussed market size and market growth; we now revisit the last hypothetical case study by adding the market-share consideration (Figure 5-4). ● ● Segment one: small size, low growth, and medium market share Segment two: small size, high growth, and low market share Understanding the Market through Marketing Research Low High Low Growth Market size Product: D Product: C Market size : $5 million Market size: $132 million Annual growth: 2% Annual growth: 10% Market share: 50% Market share: 30% Incremental opportunity: $2.5 million Incremental opportunity: $92.4 million Priority: Fourth Priority: First Segment 3 High Segment 2 Segment 1 Product: B Market size: $525 million Annual growth: 5% Market share: 90% Incremental opportunity: $52.5 million Priority: Third Segment 4 Product: A Market size: $265 million Annual growth: 15% Market share: 70% Incremental opportunity: $79.4 million Priority: Second Figure 5-4 Market segmentation by market size, growth, and share. ● ● Segment three: large size, low growth, and medium market share Segment four: medium size, high growth, and high market share. Market share information provides additional insight on where true market opportunities lie. Although the total market size for segment four is $265m, the incremental market opportunity for Company W is only $79.5m. After consideration of market share data in the four segments, we conclude that segment two is the most attractive segment, followed by segments four, three, and one. We recommend the following three-step process to incorporate market opportunity information into marketing planning. ● ● Identification of the market size and its geographic and product breakdown: Table 5-2 is a template for compiling the relevant information. Product or geography with the largest market size is often the main revenue source for a company. Identification of high growth market opportunities: Targeting high growth opportunities enables revenue generation and long-term 83 84 Data Mining and Market Intelligence Table 5-2 Template for market size by product and geography Product A ($) Product B ($) Product C ($) Product D ($) Total ($) Region 1 Region 2 Region 3 Region 4 All regions competitiveness. Table 5-3 is a template for documenting market growth information. Table 5-3 Template for annual market growth rate by product and geography Product A (%) Product B (%) Product C (%) Product D (%) Total (%) Region 1 Region 2 Region 3 Region 4 All regions ● Identification of the market share: Market share information provides insight on the actual room for growth. Table 5-4 shows a layout for documenting the market share information. Table 5-4 Template for market share by product and geography Product A (%) Region 1 Region 2 Region 3 Region 4 All regions Product B (%) Product C (%) Product D (%) Total (%) Understanding the Market through Marketing Research Using market research and data mining for building a marketing plan It is common practice for firms to set up their revenue goals using market data as the benchmark. A particular company may set a revenue growth goal of 15% just to outperform the anticipated market growth of 10% and to gain market share. It is also a very common practice for companies to apply an arbitrary percentage (usually a single digit) to their revenue goal to assess their marketing budgets. This is particularly prevalent in the high-tech industry. For instance, a high-tech company may expect to generate $10b in revenue and plans to allocate 5% of the expected revenue, or $500m, as its marketing budget. A more information-driven approach is to apply the marketing spending modeling techniques discussed in Chapter 2 to analyze historical sales and marketing spending data to produce an optimal marketing budget and allocation. Marketing planning based on market segmentation and overall company goal: case study two Based on the previous market segmentation case study illustrated in Figure 5-4, the most attractive opportunities are segments two, three, and four. Company X is one of the companies competing in these segments. A 2007 marketing plan for Company X will be created based on the market segmentation information. The first step in creating the marketing plan is to populate the template in Table 5-5 (template one) with the market data of 2006 and 2007. Then, the incremental market size growth from 2006 to 2007 and the percent contribution to the overall market size growth is populated for each segment. The total market is expected to grow 8.5%, or $72m, from 2006 to 2007. Out of the $72m, $35m (49% of total growth), $25m (35% of total growth), and $12m (16% of total growth) are from products A, B, and C respectively. The second step is to incorporate the actual revenue and market share of 2006 into template two, as shown in Table 5-6. The third step is to incorporate revenue and market share goals into template two in Table 5-6. To determine a realistic revenue growth goal for each of the three segments, we need to evaluate the historical growth rate of each product. Table 5-7 shows the 2007 revenue and market share goals for Company X, based on its historical marketing spending and revenue data. The company is expected to grow faster than the market in the products B 85 86 Segment Product 2006 market size (million) 2007 market size (million) Growth (%) Incremental market growth (million) % of incremental revenue (%) 4 3 2 2,3,4 A B C Total 230 500 120 850 265 525 132 922 15 5 10 8.5 35 25 12 72 49 35 16 100 Company X incremental revenue 2007 Company X return on investment 2007 Table 5-6 Template two – incorporation of actual company revenue and market share information from 2006 Segment Product 2006 Company X revenue (million) 4 3 2 A B C Total 60 120 80 260 2007 Company X incremental revenue goal (million) Percent revenue increase (%) 2007 Company X revenue Company X 2006 market share (%) 60/23026 120/50024 80/12067 260/85031 Company X 2007 market share goal Data Mining and Market Intelligence Table 5-5 Template one – identification of market size of 2006 and 2007, and growth from 2006 to 2007 Table 5-7 Template three – incorporation of company revenue and market share goals by product Product 2006 Company X revenue (million) 2007 Company X incremental revenue goal (million) Percent revenue increase (%) 2007 Company X revenue goal (million) Company X 2006 market share (%) Company X 2007 market share goal (%) 4 3 2 A B C Total 60 120 80 260 9 12 10.2 31.2 15 10 13 12 69 132 90.2 291.2 60/23026 120/50024 80/12067 260/85031 26.0 25.1 68.3 31.6 Table 5-8 Year 2007 budget allocation based on historical data and modeling Segment Product 2006 Company X revenue (million) 2007 Company X incremental revenue goal (million) Percent revenue increase (%) 2007 Company X revenue goal Historical incremental revenue over budget ratio Company X 2007 budget (million) Company X 2006 market share (%) Company X 2007 market share goal (%) 4 3 2 A B C Total 60 120 80 260 9 12 10.2 31.2 15 10 13 12 69 132 90.2 291.2 1.29 1.33 1.70 1.42 7.0 9.0 6.0 22.0 26 24 67 31 26 25 68 31.6 Understanding the Market through Marketing Research Segment 87 88 Data Mining and Market Intelligence and C categories and grow at the same pace as the market in the product A category. Achieving the revenue and growth goal of ABC will lead to a market share increase from 31 to 31.6%, a 0.6% increase. As illustrated in this example, a drastic increase in market share is hard to achieve. An increase of $31.2 m in revenue from products A, B, and C only results in 0.6% market share gain. The fourth step is to use the results from the third step to populate template three with budget information, as illustrated in Table 5-8. The total budget is $22 m. This is exactly where we can tie together market opportunity information, marketing budget, and overall company revenue. Segments four and three have low market shares. There is significant competition in these two segments, and therefore they require significant investment to gain new customers and market share. This competitive situation is reflected in the lower historical revenue to budget ratios. Segment two has a high market share and less competition. Market share gains and losses are highly correlated with competitive pressure, as will be discussed later in this chapter. ■ Target-audience segmentation The target audience of a segment is a group of individuals, households, or businesses that possess similar characteristics and behavior. The following section gives an overview on the common attribute groups used to describe a target audience. Target-audience attributes ● ● Demographic or corporate attributes: These attributes describe the general characteristics of an individual, a household, or a company. Age, gender, ethnicity, marital status, education, life stage, personal income, and home ownership are examples of individual demographic attributes. Household income and household size are examples of household demographic attributes. Company size, company annual revenue, industry or Standard Industry Code (SIC), and company start year are examples of corporate attributes. Social–economic attributes: These attributes describe the social economic status of a household or an individual. They are usually constructed based on zip code level census information by data vendors or marketing research companies (e.g., Personicx, Prizm, Microvision, Cohorts, and IXI). For example, one of the attributes used by Personics is referred to as ‘established elite.’ Individuals with this attribute tend Understanding the Market through Marketing Research ● ● ● ● to have a higher than average disposable income and a luxurious life style. Attitudinal attributes: These attributes describe an individual’s hobbies, interests, and social, economic, or political views, such as interest in art, space and science, sports, cooking, tennis, travel, politics, economics, antique, or fitness. Purchase behavior attributes: These attributes describe where an individual, a household, or a business is in a sales cycle. Stages in the sales cycle such as awareness, interest and relevance, consideration, purchase, and loyalty and referral, are described in detail in Chapter 3. Need attributes: These attributes describe a customer ’s or a prospect’s need for acquiring or inquiring about a product. Need for pain relief and need for wireless Internet connection are examples of need attributes. Marketing medium preference attributes: These attributes describe an individual’s preference on how to be contacted, receive information, or interact with marketers. In-person visits, direct mail, print, TV, telemarketing, billboard, newspaper print ad, and magazine inserts are examples of offline medium. E-mail, online banner, search, community, podcast, and blog are instances of online medium. Types of target-audience segmentation There are multiple ways of segmenting a target audience. Segmentation of the audience needs to be aligned with business objectives. The four most common criteria for segmentation are demographics, needs, product purchased, and value, as illustrated in Figure 5-5. ● ● ● ● Demographics-based segmentation: This is the most common segmentation approach. It gives general descriptions of the various segments in the target audience. This type of segmentation is very useful for providing insight and ideas regarding marketing creative, offers, and messages. Need-based segmentation: This type of segmentation classifies the audience by their need and is useful for constructing relevant product or service offers to potential customers. Product purchased or installed based segmentation: This type of segmentation classifies the audience by what they have purchased or deployed at their sites. This information is useful for driving targeted cross-sell and up-sell marketing strategies and tactics. Value-based segmentation: This type of segmentation classifies the audience by their value, often derived by their total dollar amount of purchase during a period of time. This is a very practical approach as the eighty-twenty rule shows that 80% of a marketer ’s revenues often comes from the top 20% of the customers with the highest values. 89 90 Data Mining and Market Intelligence Demographics Syndicated research Need Install base or product purchase Customized research Value Figure 5-5 Common segmentation types. Due to its cost effectiveness, syndicated research is a good starting point for acquiring information on audience segmentation. However, syndicated research is sometimes limited in that it cannot provide in-depth value-based segmentation, which is best derived from the company’s internal sales data. Another limitation of syndicated research is that a research firm may draw samples from and segment a population that is not fully representative of the desired target audience. We now consider a segmentation case study in the business-to-business world. Figure 5-6 shows a small business customer value segmentation for Company A. The customer segmentation is derived by a data mining technique called Classification and Regression Tree (CART), which is discussed in detail in Chapter 7. In the case study, the average purchase amount of small business customers is $5000. To accomplish the segmentation task, we use the CART technique (we discuss this approach in detail in Chapter 7), which first splits the sample by industry and identifies the professional service industry as an industry with high average purchases of $8400 while the other industries have an average of $2000. Within the professional industry, companies of size (in terms of number of employees) between 50 and 500 have average purchases of $12,000. This is the subsegment with the highest value in the whole sample. The Tree technique splits the segment in the other industries into two branches. Similar to the professional services industry, the companies with company size (in number of employees) Understanding the Market through Marketing Research Overall SB base Average purchase = $5000 Industry: Other average purchase = $2000 Industry: Professional services average purchase = $8400 # Employees < 50 average purchase = $1000 # Employees between 50 and 500 average purchase = $5200 # Employees < 50 average purchase = $6000 # Branch offices < 2 average purchase = $3200 # Branch offices between 2 and 5 average purchase = $4500 # Branch offices >5 average purchase = $5900 # Employees between 50 and 500 average purchase = $12,000 Figure 5-6 Small business customer average annual purchases of equipment X. between 50 and 500 have higher average purchases of $5200, versus $1000 of the smaller companies (company size 50). Among the companies with company size between 50 and 500 and with more than five branch offices, the average purchases are $5900. This case study illustrates that with the application of appropriate data mining segmentation technique, valuable customers subsegments can be uncovered. ■ Understanding route to market and competitive landscape by market segment Once market opportunities and the target audience are identified, the next step is to assess the ability to compete in each segment through understanding of route to market and the competitive landscape. Routes to market Customers purchase products through different avenues. A route to market is an avenue through which customers purchase products. In the case 91 92 Data Mining and Market Intelligence of direct sales, customers purchase directly from marketers. In the case of indirect sales, customers purchase from intermediaries. These intermediaries are called channel partners, retailers, resellers, or distributors. For example, a company that designs and manufactures women clothing may use several routes to market its product to customers. These routes include the company’s own physical stores, print catalogues, e-catalogs, department retail stores, and online web sites. Direct sales In direct sales, products are sold directly to customers. In many cases, selling directly to customers is not a scalable business model, and the need for leveraging a third-party reseller or distributor emerges. For example, consumer goods manufacturers leverage retail stores such as consumer electronics stores and supermarkets to distribute and sell their products. Leading high-tech companies often leverage their channel partners to a very large degree. In general, direct sales are a more common model in business-to-consumer than in business-to-business. Indirect sales In indirect sales, companies leverage intermediaries to sell their products or services. The main advantage of this distribution method is scalability. These intermediaries are called distributors, resellers, partners, whole-sellers, retailers, or channel partners. Good channel partners often enhance a company’s revenue growth. Companies using channel partners often rely on the partners to contact and interact with end customers. As a result, these companies often do not have the visibility to end customer information. However, channel partners can often provide end user data at an aggregate level. For example, instead of revealing actual end user names, distributors can provide reports on end user sales by vertical industry, company size, and geography. There are different types of indirect sales models depending on the number of intermediaries involved. A one-tier model refers to a model where there is only one channel partner between a vendor and an end user customer. A two-tier model is a model where there are two layers of intermediaries between a vendor and an end user customer. In this case, a vendor sells its product to a channel partner that then sells the product to a reseller. The reseller then sells the products to an end user customer. Revenue and investment flows Understanding cost effectiveness by route to market is essential for establishing an optimal channel strategy balance. Figure 5-7 illustrates revenue flows from direct and indirect sales. Understanding the Market through Marketing Research Partner 1 Revenue and profit from partners Investment in partners Company Partner 2 Partner 3 Partner 4 Revenue and profit from end customers End customers Investment in end customers Figure 5-7 Direct and indirect revenue and investment flows. It is important for firms to evaluate revenue and profit streams from intermediaries and end customers, as well as the firms’ marketing investments for selling into both contingencies. If the returns on investment are significantly higher from intermediaries than from end customers, then it is necessary to explore the underlying reasons. It is possible that the market climate is such that end customers prefer buying from intermediaries. Objectively assessing returns on investment from both direct and indirect sales enables firms to embark on an optimal strategy of direct and indirect sales. We now consider the market segmentation case study in Figure 5-3. Within each market segment, the contributions of direct sales and channel partner sales are rated as ‘fair ’ or ‘poor,’ as shown in Table 5-9. The company relies mainly on channel partners or others for selling its products in segments three and four. On the contrary, the company sells most of its products directly to its customers in segment two. Degree to which the firm relies on channel partners drives its investments in marketing spending for direct sales and marketing spending for indirect sales. Competitive landscape The way the market perceives the strengths and weaknesses of a particular firm affects the purchasing behavior of its customers. For instance, it is well known that a manufacturer with a solid brand inspires more trust in customers, and trust is often a key driver for product selection. 93 94 Data Mining and Market Intelligence Table 5-9 Market segmentation with route to market information overlay Market size Market growth Low High Small Segment 1 Direct sales: poor Sales through channels: poor Segment 2 Direct sales: fair Sales through channels: poor Large Segment 3 Direct sales: poor Sales through channels: fair Segment 4 Direct sales: poor Sales through channels: fair Understanding the competitive landscape in each market segment is to understand a firm’s own strengths and weaknesses as well as those of its competitors. There are many attributes that can be used to evaluate the competition. Before analyzing these attributes, however, we must identify the competitors. The most common way of identifying key existing and potential competitors is to consult industry trade publications, industry financial analysts, or research experts. There is often ranking information on companies in each industry, product or service category. Ranking can be based on market share, growth, or financial position. Sales people and customers can also help identify competitors. Since it is extremely challenging to examine every potential competitor, evaluation should be limited to the top existing and potential competitors. Once the key existing and potential competitors have been identified, the next task is to determine which attributes to use to examine the strengths and weaknesses of each competitor. In general, there are four groups of attributes to consider when analyzing the strengths and weaknesses of the competition. The four groups of competitive attributes are brand recognition, leadership, vision and innovation, current product offering, operational efficiency, and financial condition. ● Brand recognition, leadership, vision, and innovation: The four seemingly intangible attributes sometimes are important drivers of customers’ purchase decisions. Brand recognition refers to a set of perceptions and feelings evoked in customers or prospects when they are exposed to ideas such as value propositions or images (logos, Understanding the Market through Marketing Research ● ● ● symbols) about particular companies. Brand recognition is the result of customer experience and interaction with a particular company, or customer exposure to advertising, marketing, and other activities of the company. Leadership is the ability of an individual to influence, motivate, and enable others to contribute toward the effectiveness and success of the organizations of which they are members (House, Hanges, Javidan, and Dorfman, 2004). Good leadership is consistently viewed as a competitive advantage for a company. Vision refers to the long-term objectives of a company. With its vision as a guiding principle, a company may be more likely to evolve in a manner consistent with its long-term objectives. Innovation is change that creates a new dimension of performance (Hesselbein, Johnston, and the Drucker Foundation, 2002) and drives competitiveness. Current product offering: The current product offering of a firm has unique features and benefits. How these features and benefits are perceived in the market segment affects customer purchase behavior. For example, a product that is more reliable than its competing products will attract buyers that value reliability. In addition to reliability, attributes such as customer service, quality, relevance, convenience, ease of deployment and installation, scalability, warranties, variety, and pricing are also important. Service, in particular, has become a crucial factor for customers when evaluating products. Operational efficiency: Operational efficiency in corporate functions such as manufacturing, management, sales, marketing, fulfillment, inventory, and customer service are also important in shaping market perception of a firm. Companies with frequent delays in product delivery or companies that deliver defective products are likely to be perceived as companies with operational ‘weakness.’ Financial condition: Financial condition is the overall company’s financial performance, as reflected by indicators such as stock growth, revenue, profitability, returns on equity, returns on assets, debt and cash positions that affect the ability of the firm to acquire financing when necessary, capitalization, and the P/E ratio. A strong financial condition is considered a competitive advantage. Competitive analysis methods There are four analytical formats for analyzing competitive landscape: tabulation, grid, strength, weakness, opportunity, and threat (SWOT) analysis, and perceptual maps. ● Tabulation: The tabulation format is the easiest approach for compiling competitor information. In the following example, the six supermarket chains in the San Francisco Bay Area are evaluated for their 95 96 Data Mining and Market Intelligence ● ● competitiveness by brand recognition. The ratings range from one (the weakest rating) to five (the strongest rating) in each attribute category. These six supermarket chains in the analysis are Safeway, Albertsons, Bell, Costco, WholeFoods, and Ranch 99. Safeway and Albertsons are the two mainstream supermarket chains in the Bay Area and the betterknown brands of the six. Safeway and Albertsons have considerably more stores than the other four competitors. Costco is well known for its warehouse environment and low prices. Bell is slightly lesser known than Safeway, Albertsons, and Costco while WholeFoods caters to a more upscale market. Ranch 99 mainly caters to the Asian community. Based on the above information, we may give Safeway and Albertsons a rating of five, Costso a rating of four, and Bell and Ranch 99 a rating of three for brand awareness among mainstream grocery shoppers. The advantage of the tabulation approach is that it compares each player ’s strengths and weaknesses with each other ’s in detail. The disadvantage is that the tabulation approach does not provide a holistic summary of the overall competitiveness. Therefore, when the number of attributes is large, it’s difficult to derive a clear overall picture with the tabulation method. Grid is a common format used by research companies to analyze the competitive landscape. These companies use their proprietary methodologies to examine company competitiveness and present the result in a grid. Very often a competitive grid has two or three key indicators. Each key indicator is usually a composite index based on the values of specific key attributes. Some of these key attributes are similar to what we have discussed in the tabulation example. Gartner Research has developed Magic Quadrant, a graphical presentation of the competitive landscape for each of its key technology groups. Forrester Research has developed a competitive grid called Forrester Wave. Unlike the tabulation approach, grids don’t show detailed information about each player ’s strengths and weaknesses at the attribute level. Instead, grids provide a holistic, synthesized, and graphic summary view of the competitive landscape. SWOT, which stands for strengths, weaknesses, opportunities, and threats, is a very popular format for competitive landscape analysis. A SWOT analysis summarizes a firm’s overall competitive strengths and weaknesses, the market opportunity, and the competitor threats. We now reconsider our previous example of market segmentation of Company X in Figure 5-5 with a focus on the competitive landscape in segment four. The market size of this segment is estimated to be $230m in 2006 and $265m in 2007. The growth from 2006 to 2007 is 15%. The following is a four-step process for constructing a SWOT analysis. The first step is to identify Company X’s strengths, weaknesses, market opportunities, and competitive threats. The result is Understanding the Market through Marketing Research Table 5-10 Identification of the strengths, weaknesses, opportunities, and threats in segment four Strengths Weaknesses ● Extensive channel partner ● Company X’s market share is a network for distributing and selling ● Reliable product ● Competitive price distant no.2 from that of the market leader, TUV (market share: 50%) ● Company X has poor brand name recognition compared to TUV Opportunity Threats ● High overall market growth ● Market leader TUV is actively at 15% pursuing Company X’s largest customers ● A local vendor in Asia just announced a major price-cutting promotion illustrated in Table 5-10. The second step is to leverage the strengths to take advantage of the current opportunities or mitigate the competitive threats. This is shown in Table 5-11. The third step, illustrated in Table 5-12, is to prevent weaknesses from sabotaging opportunities or amplifying competitive threats. The fourth step is to maintain areas of strengths and strengthen areas of weakness over time. The outcome Table 5-11 Leveraging strength to take advantage of opportunities or mitigate threats Leveraging strength to take advantage of opportunities Leveraging strength to mitigate threats ● Leveraging Company X’s extensive ● Creating a customer loyalty channel partner network to capture high market growth (e.g., increasing investment in joint customer seminars with partners) ● Promoting value propositions on product reliability and competitive pricing program to prevent customer attrition due to TUV’s threat ● Examining and negotiating the profit-margin structure with partners in Asia to ensure maximum level of price competitiveness 97 98 Data Mining and Market Intelligence Table 5-12 Preventing weaknesses from sabotaging opportunities or amplifying threats Preventing weaknesses from sabotaging opportunities Preventing weaknesses from amplifying threats ● Poor brand recognition may ● Poor brand recognition may prevent Company X from taking advantage of this opportunity. Company A needs to invest in its brand awareness programs prevent company X from convincing the market that it provides value with a price premium in Asia. Company X needs to promote its value proposition and brand in Asia of a SWOT analysis will not only guide short-term planning, but also point out areas for improvement for long-term success. Like the other competitive analysis formats, SWOT has its advantages and disadvantages, as shown in Table 5-13. Table 5-13 Advantages and disadvantages of SWOT Advantages of the SWOT method Disadvantages of the SWOT method ● Information is easy to acquire ● Analysis may be subjective ● Hard to quantify the from syndicated research companies or internal marketing/sales groups ● Analysis is easy to construct ● Analysis is easy to digest ● interrelationships between the four components, namely, strength, weakness, opportunity, and threat ● Hard to tie the information to customers, their needs, and their future purchase plans ● Public information may lead to minimal competitive advantages A perceptual map is a graphical depiction of the market perception of a product. The key difference between a grid and a perceptual map is that the latter is constructed with statistical data mining techniques while the former is usually derived from marketing research data summaries. We now consider the illustrative case study of Company X in Figure 5-5. A survey is conducted on three groups of audiences for product A: prospects, high-value customers (customers who have Understanding the Market through Marketing Research purchased high volume of product A), and low-value customers (customers who have purchased low volume of products A). The three groups of audiences are asked to identify whether six specific features are crucial to their purchase decisions. These six features are brand strength, uniqueness of features, pricing, product quality, customer service, and ease of acquiring the product. The three groups of audiences are also asked to identify if they associate any of these six features with the three main vendors, Company X, Competitor Y, and Competitor Z. The results of the survey are compiled and correspondence analysis, which we will discuss in Chapter 7, is conducted to construct the perceptual map shown in Figure 5-8. From the map, we can observe the distances between the three target audiences, the six product features, and the three companies. Close proximity indicates a higher degree of association. On the map, Company X is close to Low-value customers Dimension 2 Competitor Y Uniqueness of features Pricing Prospects Brand strength Competitor Z Easy to acquire Company X Customer service Product quality High-value customers Target audience Company or product attribute Company name Dimension 1 Figure 5.8 Perceptual map analysis of product A. three features: brand strength, product quality, and customer services. Therefore, these three features are Company X’s strengths and competitive advantages. In addition, Company X is the vendor that is closest to the high-value customer segment. This means that Company X is viewed by this audience segment more favorably than its two competitors. Like the other competitive analysis formats, the perceptual map has its advantages and disadvantages, as shown in Table 5-14. 99 100 Data Mining and Market Intelligence Table 5-14 Advantages and disadvantages of a perceptual map Advantages of perceptual map Disadvantages of perceptual map ● Provides graphic representation ● Requires significant investment of multiple dimensions simultaneously ● Relationships between competitive advantages, target audience, customer needs, and customer future purchase plans are easily quantifiable ● Audience feedback provides objective view of the competitive landscape ● Proprietary analysis may lead to competitive advantage in time, resources, and expertise in gathering and analyzing relevant data ■ Overview of marketing research Marketing research is research that helps advance understanding of the market and the customers, generating information that helps make better marketing investment decisions. The following skills are the required skills for conducting marketing research. ● ● ● ● ● ● ● ● ● ● Economic, business, and statistical knowledge Experience in syndicated research and customized research Experience in primary and secondary data Knowledge of survey sampling, sample size, and questionnaire design Knowledge of focus group research Knowledge of panel studies Knowledge of request for proposal (RFP) and research vendor management Knowledge of list rental and list brokerage business Ability to communicate and explain complex research concepts to both business and IT audiences Ability to provide actionable recommendations to address business issues. Figure 5-9 shows a step-by-step thought process for marketing research planning and implementation. Understanding the Market through Marketing Research Identify business objectives Determine final research deliverable requirements Search current available Y syndicated research and determine if it addresses the need Utilize syndicated research N Acquire customized research: solicit research vendor proposals with a RFP Select appropriate vendor proposal Actively participate in the research process including sample selection and questionnaire design Translate research results to actionable business recommendations Figure 5-9 Effective marketing research thought process. Throughout the remaining of this chapter, we will introduce the following important market research topics: syndicated research versus customized research, primary data versus secondary data, sample size, questionnaire design, focus groups, and panel studies. Syndicated research versus customized research Syndicated research, which can be acquired through subscription, is research that is prepackaged by research companies. This type of research is conducted on the basis of the research firms’ assumptions, specifications, and criteria. When searching for market data and intelligence, we should first consider syndicated research since it is one of the most costeffective sources. Different research firms specialize in different industries and products. Subscriptions are usually on a one-time, quarterly, or annual basis. In addition to selling prepackaged syndicated research, research companies often provide consulting services arranged along the following lines. ● Subscription to an inquiry service grants subscribers direct access to analysts for additional information beyond the standard reports. There is usually a threshold on how much time a subscriber can spend 101 102 Data Mining and Market Intelligence ● with analysts either face to face or by phone over the duration of his subscription. Occasionally, analysts may recommend a one-time project to address a subscriber ’s additional needs. A one-time project may lead to a customized research project, as we discuss in the next section. Customized research tends to be more expensive than syndicated research as the former is customized for very specific needs and the latter is intended for a broader audience base. Customized research has very specific objectives and deliverables customized to a particular marketer ’s needs and often involves collecting primary survey data. Table 5-15 illustrates the main differences between syndicated research and customized research. Table 5-15 Syndicated research versus customized research Research specification Data collectors Cost Syndicated research Third-party research company Third party Low Customized research Marketers themselves Third party or marketers themselves High The following step-by-step process should be followed for planning and executing customized research: ● ● ● ● ● ● ● ● Identification of business objectives Identification of deliverables that allow the research project to meet the objectives Creation of an RFP to solicit research vendors’ proposals Evaluation and selection of vendor proposals Determination of sample size and source Designing of the questionnaire(s) Collection of data Analysis of results to derive learning. Customized research planning case study Company ABC is a storage system supplier for Fortune 500 companies in the US. The overall objective of ABC is to understand the future storage spending of its customers, their vendor preferences, their purchase Understanding the Market through Marketing Research processes, and the appropriate marketing messages. Specifically, Company ABC wants to address the following questions through customized research. ● ● ● ● ● How much do the Fortune 500 companies plan to spend on storage in 2008? Do different industries have different levels of need in storage systems in 2008? Which vendors are on the top of mind among the Fortune 500 companies when it comes to storage system purchases? What are the Fortune 500 companies’ selection criteria for storage vendors? What marketing messages will resonate well with these Fortune 500 companies? The following five deliverables can be used to address the questions above: ● ● ● ● ● Understanding the 2008 budgets for storage systems among the Fortune 500 companies Analyzing the 2008 storage budgets by industry Compiling vendor rankings from survey respondents at Fortune 500 companies Getting feedback regarding vendor selection criteria Getting input regarding drivers and barriers for purchasing storage systems. After Company ABC identifies its research objectives and deliverables, it creates an RFP to solicit vendor proposals. An RFP is an effective way of collecting vendor proposals for evaluation. An RFP does not need to be overly complex. However, it needs to cover the following key components: ● ● ● ● ● ● ● Project overview Objectives Deliverables Methodology Proposal submission Project timeline General conditions and terms. The next example illustrates the use of RFP to solicit vendor proposals. Project overview ABC, a company specializing in storage systems, wants to understand the future needs of this market to prioritize marketing investments and resources. 103 104 Data Mining and Market Intelligence Objectives The project has the following objectives: ● ● ● ● ● Understanding the 2008 budgets for storage systems of Fortune 500 companies Analyzing the 2008 storage budgets by industry Compiling vendor rankings from survey respondents at Fortune 500 companies Getting feedback regarding vendor selection criteria Getting input regarding drivers and barriers for purchasing storage systems Deliverables ● ● ● ● Executive summary Analysis and recommendations to support the business objectives Raw survey data An on-site presentation explaining the results and conclusions of the research Methodology Blind face-to-face interviews Proposal submission Submit proposal to ABC by September 27, 2007. Contact information: Sheila Wu, Research Manager Tel: (703) 446-5272, e-mail: swu@abc.com Project completion timeline Completion by November 30, 2007. General conditions and terms All information provided herein is the proprietary of ABC, Inc. This information is furnished specifically and solely to allow the prospective vendor to estimate the cost of executing this project. Any other usage of this information is strictly prohibited without the prior written consent of ABC. After distributing the RFP, Company ABC waits for the research vendors to submit their responses. ABC should look for the following key components in a vendor proposal: ● ● ● ● ● ● ● Project overview Objectives Methodology Data and analysis Survey sample and questionnaire (if collection of primary survey data is required) Deliverables Project timeline Understanding the Market through Marketing Research ● ● ● ● Vendor project team (team member qualification and biography) Overview of vendor capability (competitive strengths relative to the other vendors) Fees General conditions and terms (including legal and contractual agreements). A vendor proposal is, in essence, a research plan. A proposal should correspond to the key components in the RFP with more in-depth information. Table 5-16 illustrates a comparison between an RFP and a vendor proposal in terms of key components. Table 5-16 Comparison of key components between an RFP and a ven- dor proposal Key component required RFP Vendor proposal Project overview Objectives Deliverables Methodology Data and analysis Survey sample and questionnaire* Proposal submission Project timeline Overview of vendor capability Vendor project team and member bio Professional fee General conditions and terms Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes * If primary survey research is required. Primary data versus secondary data Primary data refers to research data directly collected from the target audience. Data collected from the target audiences by others (third parties) is called secondary data. It is often more expensive to acquire primary data than secondary data. However, there are situations where collection of primary data is necessary. For example, ● ● The business objectives cannot be met by any existing syndicated research. The target audience is so specific that no syndicated research can address the specific need. 105 106 Data Mining and Market Intelligence ● Existing syndicated research reports may offer conflicting information that cannot be reconciled. It is very common for different research companies to produce differing forecasts for the same market. Surveys In a survey, a questionnaire or a script is used to collect information from a group of people through various communication methods such as direct mail, e-mail, telephone, and face to face. A questionnaire consists of a list of questions in either multiple choice or open-ended text format. Survey communication methods A variety of survey methods are available. Different audiences may have difference preferences about how they are surveyed. It is important to ensure that the responder composition is representative of the overall target audience. In a direct mail survey, questionnaires are sent to the target audience by postal mail. Respondents fill out the questionnaire and return the questionnaires in a business response envelope (BRE) or a business response card (BRC) by mail. Cost of direct mail can sometimes be high and is driven mainly by direct mail production and mail postage. Cost per response is higher if the target audience reached is irrelevant or unresponsive. Direct mail response rate is driven mainly by the relevance of the target audience and address data accuracy. Lower address data quality tends to result in lower response rate. Response rate may vary by industry, product, and service. Response time of direct mail varies and ranges from a couple of days to weeks. Most responses come in within a month. Response time depends on the complexity of the questionnaire as well. The more complex a questionnaire is, the longer its response time will be. E-mail In an e-mail survey, electronic questionnaires are sent to the target audience by e-mail. Respondents may respond by completing the questionnaire on the web, by e-mail, or by other methods. E-mail cost tends to be low and usually runs at several cents per e-mail sent. E-mail is one of the least expensive ways of conducting a survey. E-mail response time is usually much shorter than direct mail response time. Most responses come in within days. As is the case with direct mail, e-mail survey response rate depends on factors such as target audience and accuracy of e-mail address data. Understanding the Market through Marketing Research Phone surveys In a phone survey, the target audience is contacted by phone to answer a list of questions from a questionnaire or a script read by a phone survey representative. Phone interview cost is higher than direct mail and much higher than e-mail. However, phone interviews have the advantage of getting respondent data instantly and clarifying any questions or confusion that respondents may have. When interviewer training is required, additional training cost needs to be factored in. Phone interview response time is real time. Once a respondent is reached and agrees to go through a survey, the data is collected instantly. The challenge is to successfully reach the target audience. Phone interview response rate also depends on factors such as responsiveness of the target audience, accuracy of phone numbers, offer, and target audience availability. Training is essential for phone interviewers. Interviewers need to possess the basic understanding of the use of a script or questionnaire to collect answers from respondents. When the research subject is technical or specialized, additional training needs to be given to interviewers so they can articulate their questions appropriately. In some cases, interviewers need to have certain professional knowledge and experience to effectively carry out the survey. For example, a phone survey on computer server purchases may require interviewers with in-depth technical knowledge in the sever business. Interviewers also need to be very perceptive of the respondent’s reactions and must be able to make adjustments accordingly. Nowadays, phone interviews are often conducted in call centers where every interviewer has a cubicle, a phone, and a computer terminal to access information when needed. Usually a supervisor is assigned several interviewers to monitor. Supervisors are equipped with communication gear to give timely coaching or feedback to their staff interviewers. Interviewers’ access to timely feedback is one advantage that phone interviews may have over other types of interview methods such as direct mail, e-mail, or face-to-face interviews. Computer assisted telephone interviewing (CATI) is designed to enable phone interviewers to conduct telephone interviews effectively. CATI enables predictive dialing, questionnaire management, sample and quota management, data access, data entry, and analysis. Predictive dialing is a feature of CATI that allows for automatic dialing of batches of phone numbers to connect with phone interviewers and those they intend to survey. Sample and quota management is a feature of CATI that tracks and compares a predetermined quota on respondents and the number of respondents that a phone interviewer actually reaches and surveys. 107 108 Data Mining and Market Intelligence Prescheduled or intercept face-to-face interview Face-to-face interviews can be prearranged with potential interviewees and conducted in a predetermined location at a preset date and time. Prearranged interviews allow for careful screening of the target audience and suitable arrangement of the interview start and end time. Planning prior to interviews can be very time-consuming and resource intensive. On the other hand, face-to-face interviews that are not prearranged are intercept interviews, which do not require a great deal of time for audience screening. The preparation and planning time is minimal, and one can usually find a large sample of potential interviewees in places such as shopping malls. However, the quality of interviewees from intercepts might sometimes be questionable due to lack of prescreening. As is the case with phone interviews, once a respondent agrees to be interviewed, the response time of a face-to-face interview is real time. Response rate depends on numerous factors such as relevance of target audience and timing of the interview. Training is essential for face-to-face interviewers. The interviewers need to have a basic understanding of how to follow a script or questionnaire to collect answers from the respondents. In-person interviewers have the opportunities to observe the respondents face to face and adjust questions accordingly. Panel studies A panel is a group of people, households, or businesses that respond to questionnaires on a periodical basis. The duration of a panel can vary from days to years. Panel surveys are administered through direct mail, e-mail, and face-to-face interviews. Major research companies usually have established panels for ongoing surveys and monitoring. The cost of panel studies depends on the survey mechanism. Usually, phone and face-to-face interviews are more expensive than direct mail and e-mail. The response time also depends on the survey mechanism. The response rates of panel studies are usually higher since panels usually consist of dedicated respondents. The following is a list of examples on existing panels (Blankenship and Breen, 1995). ● ● Nielsen Media Research offers national measurement of television viewing National Television Index (NTI) using its People Meter to measure the television viewing of various household members. Arbitron’s Portable People Meter (PPM) measures consumers’ exposure to any encoded broadcast signal (e.g., cable TV and radio). Understanding the Market through Marketing Research ● ● ● NPD Group has an online panel of over 3m registered consumers (www.npd.com, 2007). Home testing Institute, a division of Ipsos, has a panel of households available for monthly mailing surveys. ACNielsen SCANTRACK collects scanner-based marketing and sales data weekly from a sample of stores. Panel surveys have several advantages over other alternatives. First, they provide the possibility of conducting longitudinal studies to observe behavioral changes in the same sample over time. Second, panel surveys usually cost less than nonpanel surveys since there is only a one-time setup cost with panels. Omnibus studies An omnibus study is an ongoing study in which new questions can be added gradually to a regular panel study. Omnibus studies are costeffective since multiple companies share the up-front survey setup cost. Omnibus studies are cost-effective when there are few questions to be added to the survey. Omnibus studies become less cost-effective when the number of additional questions is large (Blankenship and Breen, 1995). Focus groups A focus group is a small discussion group led by an experienced moderator, whose role is to stimulate group interactions. This format has the advantage of generating group insight that is not attainable through separate one-on-one surveys. Focus groups can be used for exploring new product ideas, advertising concepts, and customer attitudes and perception. It is a qualitative rather than a quantitative method given that the sample size is very small (usually between 7 and 12 people in a group). However, insight gathered from a focus group can be very helpful for planning further research and analysis via other mechanisms. The cost of a focus group can be significant. Such cost includes the expense in recruiting the group members, the moderator fee, facility access, and equipment for monitoring and recording. Sampling methods There are two types of sampling methods: probability and nonprobability sampling (Green, Tull, and Albaum, 1988). Probability sampling involves applying some sort of random selection. Nonprobability sampling does 109 110 Data Mining and Market Intelligence not involve application of random selection. There are four types of probability sampling methods. ● ● ● ● Simple random sampling: In simple random sampling, each subject has an equal probability of being selected. The first step in this sampling method is to assign each subject a computer-generated random number. For example, to select a sample of 1000 out of a population of 100,000, one would generate a uniform random number for each of the 100,000 records and select the 1000 records with the highest random numbers. Stratified sampling: In stratified sampling, the data is first divided into several mutually exclusive segments, and then a random sample is drawn from each segment. Cluster random sampling: In cluster random sampling, the data is first divided into mutually exclusive clusters (segments), and then one cluster is randomly selected. All of the records in the selected cluster will be measured and included in the final sample. Multistage sampling: In multistage sampling, more than one of the sampling methods mentioned previously are utilized. There are four types of nonprobability sampling methods. ● ● ● ● Quota sampling: In quota sampling, a sample is selected based on a predefined quota. For example, given a quota of 50:50 female-to-male ratio and a total of 1000 subjects, 500 females and 500 males will be selected. Convenience sampling: In convenience sampling, samples are drawn from data sources that are easy or ‘convenient’ to acquire. For example, in clinical trials or shopping mall intercept surveys, respondents are acquired based on their availability. Availability does not guarantee that the sample is representative of the population, however. Judgment sampling: In judgment sampling, the sampler has a predefined set of characteristics on which the sampling is based. For example, in a mall intercept survey, the samplers may target adults with an age range between 20 and 30. Snowball sampling: In snowball sampling, the sampler relies on ‘viral marketing’ or ‘word of mouth’ to increase his sample size. In this case, the original sample size may be small but the sample size increases as those sampled refer people they know to the sampling process. Sample size One frequently asked question in survey research is how big the sample size should be. In general, we assume that the data has a normal distribution Understanding the Market through Marketing Research and select a sample size that will give the desired result within a given confidence level such as 95%. A confidence level is defined as the percentage of time when the result is expected to be accurate not due to chance. Sample size based on sample mean In this section, we discuss the derivation of an appropriate sample size that allows for proper estimation of the mean of an attribute in a sample. The first step in the derivation is to determine an acceptable standard error of the attribute mean estimation, denoted as E. For instance, we may assume the acceptable standard error as 0.5 years when estimating the mean age of the persons in a sample. The second step is to assess the standard deviation of the age in the population, denoted as . Let us assume that the standard deviation of the age is 3 years. The third step in the derivation is to identify the Z score (the concept of Z score will be discussed in Chapter 6) for a predetermined confidence level, such as 95%. In a normal distribution, the Z score at a 95% confidence level is 1.96. The fourth step of the derivation is to compute the appropriate sample size. The sample size is obtained based on the following formula: n= 3 2 (1.96)2 2 Z2 = = 138 0. 52 E2 (5.6) In this example, the appropriate sample size is 138. Sample size based on sample proportion This section discusses the process of deriving the appropriate sample size for estimating the percentage of voters with a particular voting disposition. The first step in the derivation is to estimate the general voting disposition of the population p such as 45% voting for party X. The second step in the derivation is to determine a standard error E such as 0.5%. The third step in the derivation is to identify the Z score given a confidence interval such as 95%. In a normal distribution, the Z score at a 95% confidence level is 1.96. The fourth step is to compute the appropriate sample size. The sample size can be computed based on the following formula: EZ p(1 p) n The appropriate sample size n is 38,032. (5.7) 111 112 Data Mining and Market Intelligence ■ Research report and results presentation It is important to deliver a final research report or presentation that clearly addresses the initial business objectives. Data and information are important, but actionable recommendations are even more crucial. One common mistake frequently seen in research reporting is the presentation of an abundance of data and charts with no actionable recommendations. The following is a framework for the basic structure of an effective research report or presentation. Structure of a research report ● ● ● ● ● ● ● Background: This section gives an overview of the project background. This overview should be consistent with the original proposal and the RFP. Outline: The outline section consists of the topics the report discusses. The objective of the outline is to give the reader a clear idea on what to expect throughout the report. Executive summary: This is one of the most important sections in the whole report. Busy executives often scan only the summary section to determine if the report is worth further reading. It must be factual and must provide the answers needed to address project objectives. The executive summary must also highlight a set of practical and actionable recommendations. Research methodology: This section needs to clearly state the research methodology employed. For example, if a survey is included in the study, survey mechanism such as direct mail and e-mail needs to be clearly stated. If questionnaires are involved, they should be included in an appendix. Data sources: In this section, the source of the data and the sample size, if applicable, need to be specified. Clear description of data attributes needs to be given, and data collection methods need to be stated as well. Detailed information about the data can be included in an appendix. Key findings: Key findings are compilation of results in more granular detail than what is presented in the executive summary. It is a good idea to break down the findings into different sections and have a summary for each section. This helps the reader not to get mired in data and numbers. Recommendations: Recommendations must be actionable, practical, and need to be fact-based and analysis-driven. Understanding the Market through Marketing Research ● ● Reference and acknowledgments: Acknowledgments need to be given when quoting a data source or a piece of analysis that one does not have ownership of. Written permissions from owners of the data sources may be required. Appendix: In this section, we can insert additional information such as questionnaires, anecdotal commentary, and detailed information on raw data. ■ References Blankenship, A.B., and G.E. Breen. State of the Art Marketing Research. AMA/NTC Business Books, Chicago, Illinois, 1995. comScore Networks comScore Media Matrix. comScore Press Release. Reston, Virginia, February 28, 2006. Hesselbein, F. and R. Johnston. The Drucker Foundation. On Leading Change: A Leader to Leader Guide. Jossey-Bass, San Francisco, CA, 2002. House, R.J., P.J. Hanges, M. Javidan, and P.W. Dorfman. Culture, Leadership, and Organization: The Global Study of 62 Societies. Sage Publications, Thousand oaks, CA Inc, 2004. Green, P., D.S. Tull, and G. Albaum. Research for Marketing Decisions, 5th ed. Prentice-Hall, Englewood Cliffs, New Jersey, 1988. 113 This page intentionally left blank CHAPTER 6 Data and Statistics Overview This page intentionally left blank This chapter gives an overview of data and basic statistics with an emphasis on the data types and distributions that drive the selection of data mining techniques particularly relevant to quantitative marketing research. ■ Data types The data we are concerned with results from assigning values (historical or hypothetical) to variables used in statistical analysis. Therefore, when we refer to data types, we also refer to the types of the variables the data originates from. There are two data types: non-metric data and metric data. Within each data type, there are sub data types. Under the nonmetric data type, there are three sub types: binary, nominal (categorical), and ordinal. A binary variable has only two possible values. Whether or not a survey has received a response can be characterized by a binary variable with only two possible values: ‘response’ and ‘no response’. A nominal or categorical variable can have more than two values. For example, a variable ‘income group’ can have three possible values: ‘high income’, ‘medium income’, and ‘low income’. The values of a nominal or categorical variable are given for identification purposes rather than for quantification. Group number, a variable used to identify specific groups, is an example of nominal variable. An ordinal data type differs from binary or nominal types in that it denotes an order or ranking. Ordinal data does not quantify the difference between any two rankings, however. Assume we rank three movies on the basis of their popularity. ‘Spiderman II’, ‘Sweet Home Alabama’, and ‘Anger Management’ have the ranking of first, second, and third, respectively. ‘Spiderman II’ is more popular than ‘Sweet Home Alabama’ but the ranking does not provide any information on the difference in popularity between the two movies. Metric data is also called numeric data and can be either discrete or continuous. A discrete variable, such as the age of persons in a population, takes on finite values. A continuous variable is a variable for which, within the variable range limits, any value is possible. Time to complete a task is a variable with a continuous data type. ■ Overview of statistical concepts This section provides an overview of fundamental statistical concepts including a number of basic data distributions. 118 Data Mining and Market Intelligence Population, sample, and the central limit theorem A population is a set of measurements representing all measurements of interest to a sample collector (Mendenhall and Beaver 1991). The population of females aged between 35 and 40 in New York City consists of all females in that age range in New York City. A sample is a subset of measurements selected from a population of interest. For example, a sample may consist of 500 women aged between 35 and 40, randomly selected from the various boroughs in New York City. In this example, the sample size is 500. According to the central limit theorem, the distribution of the sum of independent and identically distributed random variables tends to a normal distribution as the number of such variables increases indefinitely (Gujarati 1988). The concept of normal distribution will be discussed in the data distribution section in this chapter. Random variables In what follows we are interested in data that can be modeled as random variables. A random variable is a mathematical entity whose value is not known until an experiment is carried out. In this context, carrying out an experiment means observing the occurrence of an event and associating a numerical value to the event. For example, consulting a news report to determine whether the stock market went up from yesterday to today is an experiment and the fact that the market went up or down is the event in question. The amount by which the stock market went up or down is the value associated with the event just described. This value is the realization of a random variable. In this case, the random variable is the stock market change between yesterday and today. Random variables are called discrete if they take on discrete values, or continuous if they take on continuous values. The Growth Domestic Product (GDP) is an example of a continuous random variable, whose value becomes known when the GDP is reported. The number of customers per hour that visit a store is an example of a discrete random variable. Next we review basic concepts of probability that we need in order to model data as random variables. Probability, probability mass, probability density, probability distribution, and expectation Probability is the likelihood of the occurrence of an event. Since random variables are numerical values associated with events, in what follows we Data and Statistics Overview will simply refer to probability as the likelihood that a random variable realizes (or takes on) a particular value. If the random variable is discrete, the probability that it will take on a particular value is given by its so-called probability mass. The probability mass is a positive number less or equal to one. The probability mass that a discrete random variable X takes the value xj is denoted by p(xj). To describe continuous random variables we need the concept of probability density. If X is a continuous random variable, the probability that X takes on a value within the interval x and x dx is given by p(x) dx, where dx is a differential and f(x) is the probability density function. Notice the distinction between X, the random variable, and x, the values X can take. The probability distribution function (not to be confused with the density function) is the probability that a random variable will take on values less or equal to a particular value. If the random variable is discrete and can take on n values, its probability distribution function is defined as follows. ji P(xi ) ∑ p(x j ) j1 (6.1) If the random variable is continuous, the probability distribution function is defined as follows: x ∫ F( x ) f (s)ds (6.2) xmin where xmin is the minimum value random variable X can take. In the case of a discrete variable X that can take on n values, the expectation is defined as follows. in E(X ) ∑ xi p(xi ) i 1, 2, 3 , … , n (6.3) i1 If random variable X is continuous, its expectation is given by the formula xmax E(X ) ∫ xf (x )dx (6.4) xmin where xmin and xmax are the smallest and largest values the continuous random variable X can take. The expectation is usually denoted by the Greek letter . 119 120 Data Mining and Market Intelligence Mean, median, mode, and range There are two types of means: arithmetic and geometric. The properties we discuss next are defined for both continuous and discrete random variables. For simplicity, however, in this section we focus on the discrete case. In the case of discrete random variable, X, the arithmetic mean is given by the average of its possible values. in Xa ∑ i1 xi i 1, 2, 3 , … , n n (6.5) If the values 2, 6, 8, 9, 9, and 11 are instances of the random variable X, its mean is Xa 2 6 8 9 9 11 7.5 6 The geometric mean of a discrete random variable is given by the geometric average of its possible values. in X g n x1 x2 … xn n ∏ xi i 1, 2, 3 , … , n i1 (6.6) For the same realized values of X, the geometric mean is X g 6 2 6 8 9 9 11 6.64 The median is the value in the middle position in a sorted array of values. If there are two values in the middle position, the median is the average of these two values. The median in the example we are discussing is 8.5. The mode is the number that appears most frequently in a group of values. In our example, the mode is 9. The range is the difference between the largest and the smallest values in a group of values. In our example, the range is 9, the difference between 2 and 11. Variance and standard deviation The variance of a population of N observations, 2, is the mean of the squares of the deviations of the observations from the population mean . 2 1 N iN ∑ ( x i )2 i1 (6.7) Data and Statistics Overview The standard deviation of a population is the positive square root of the population variance. 1 N iN ∑ i1 ( x i )2 (6.8) The standard error of the mean of a sample of size N is given by the expression N (6.9) The variance of a sample of size n N is defined in the same way as the variance of the population, with N replaced by n. However, especially when the sample size is small, it is preferable to use an alternative expression for the variance of the sample where n is replaced by n 1, as follows, s2 1 in ∑ ( x i x )2 n 1 i1 (6.10) The reason why this formula is preferred to the expression for the variance is that s2 is an unbiased estimator of the population variance. The corresponding expression for the standard deviation of a sample is (Mendenhall and Beaver 1991). s 1 in ∑ ( x i x )2 n 1 i1 (6.11) In our example of the sample with five realized values, 2, 6, 8, 9, 9, and 11, the variance of the sample is 9.9 and the standard deviation of the sample is 3.15. Percentile, skewness, and kurtosis As before, we focus on the discrete case for simplicity. The p percentile is a value such that p% of the observations in a sample have a value less than this value. The skewness gives an indication of the deviation from symmetry of a density function (Rice 1988). Skew 1 n in ∑ ( x i x )3 i1 s3 (6.12) 121 122 Data Mining and Market Intelligence The kurtosis characterizes tails of a distribution (Rice 1988) 1 n Kurtosis in ∑ i=1 (xi x )4 (6.13) s4 In our example of the sample with five realized values, 2, 6, 8, 9, 9, and 11, the skewness of the sample is 0.642 and the kurtosis of the sample is 1.84. We often use the normal density as a reference to characterize the tails’ size by defining the excess kurtosis. Since the kurtosis of the normal density function is equal to three, the excess kurtosis is given by iN Excess kurtosis ∑ i1 ( xi )4 N4 3 (6.14) Probability density functions The probability density function defines the distribution of probability among different realizable values of a random variable. This section gives an overview of probability density functions of eight commonly used data distributions: uniform, binomial, Poisson, exponential, normal, chisquare, Student’s t, and F distributions. Uniform distribution A random variable with uniform distribution has a constant probability density function. If a and b are the minimum and maximum values the random variable can take, the uniform density function is f ( x) 1 ( b a) The expectation and variance of x are given by 2 ab 2 ( b a )2 12 (6.15) Data and Statistics Overview Normal distribution The probability density of a normally distributed random variable is f (x ) 1 2 2 e(x ) /2 2 (6.16) where = x = , is the expectation, and is the standard deviation. The normal density can be standardized by rescaling the random variables as follows. Z x (6.17) The density function of Z is normal with zero expectation and unit variance. The notation Z ~ N(0,1) is used to denote that Z is a random variable drawn from a standardized normal distribution. Binomial distribution The probability density function of a binomial random variable is as follows (Mendenhall and Beaver 1991). f (x ) N! p x qNx x ! (N x )! (6.18) A binomial event has only two outcomes. For example, an undertaking whose outcome can be described by success or failure can be characterized by a binomial distribution. The integer value x is the number of successes in a total of N trials where 0 x N. The success outcome has a probability p and the failure outcome has a probability of q 1 p. The expectation and variance of a binomial random variable are Np 2 Np(1 p) Poisson distribution A Poisson distribution characterizes the number of occurrences of an event in a given period of time. This distribution is appropriate when there is no memory affecting the likelihood of the number of events from period to period. The probability density function of a Poisson distribution is as follows. 123 124 Data Mining and Market Intelligence x e x! f (x ) (6.19) The variable x represents the number of event occurrences during a given period of time, during which on average events occur. Both the expectation and the variance of x are equal to . Exponential distribution Exponential random variables characterize inter-arrival times in Poissondistributed events. The probability density function of an exponential distribution is as follows. f (x ) ex where (6.20) 0, and the expectation and variance are given by n 2 1 1 2 The exponential distribution reflects absence of memory in the interarrival times of Poisson-driven events. Chi-square ( 2) distribution A chi-square density characterizes the distribution of the sum of independent standardized normally distributed random variables, Zi. 2 k ik ∑ Zi2 i1 (6.21) Here, k is the degree of freedom as well as the number of independent standardized normal distribution variables. The probability density function of the chi-square distribution is as follows. f (x ) e x k 1 2 x2 k ⎛k⎞ 2 2 ⎜⎜⎜ ⎟⎟⎟ ⎝2⎠ The gamma function, , is defined as (6.22) Data and Statistics Overview ( ) ∫ t 1 et dt (6.23) 0 and has the recursive property ( 1) ( ) (6.24) The mean of a chi-squared random variable is equal to k, and its variance is 2 2k Student’s t distribution A Student’s t distribution describes the ratio, t, of a standardized normally distributed random variable, Z1, and the square root of a 2 distributed random variable, Z2, over its degrees of freedom (Gujarati, 1988). t Z n Z1 1 Z2 Z2 n (6.25) The probability density function of t is ⎛ n 1 ⎞⎟ −( n1) Γ⎜⎜ ⎟ ⎝ 2 ⎟⎠ ⎛⎜ x 2 ⎞⎟ 2 f (x ) ⎜1 ⎟⎟ ⎛ n ⎞ ⎜⎝ n⎠ nπΓ ⎜⎜ ⎟⎟⎟ ⎝2⎠ (6.26) The mean of a t distribution is zero. Its variance is 2 k k2 F distribution The F distribution describes the ratio of two 2 distributed random variables Z1 and Z2,with k1 and k2 degrees of freedom, respectively Z1 k F 1 Z2 k2 (6.27) 125 126 Chapter 6 The probability density function is as follows. f (x ) (k1 x )k1 k 2 k2 (k1 x k 2 )k1k2 ⎛k k ⎞ xB ⎜⎜ 1 , 2 ⎟⎟⎟ ⎝2 2⎠ (6.28) ⎛k k ⎞ k k Here, B ⎜⎜ 1 , 2 ⎟⎟⎟ is the beta function with parameters 1 , 2 . ⎜⎝ 2 2 ⎠ 2 2 The expectation and variance of an F-distributed variable are 2 k2 k2 2 for k 2 2k22 (k1 k2 2) k1 (k2 2)2 (k2 4) The variance does not exist when k2 2 for k2 4 (6.29) 4. Independent and dependent variables An independent variable is also called a predictive variable. Prediction in this context means estimating the possible value of a dependent variable with a given level of confidence. A dependent variable is also called an outcome variable. Covariance and correlation coefficient Covariance measures the level of co-variability between two random variables. If X and Y are random variables, their covariance is defined by the expression cov(X ,Y ) E((x x )(y y )) E(xy ) x y (6.30) where x and y are the mean of X and Y, respectively. A correlation coefficient between two variables X and Y gives an indication of the level of linear association between the two variables. There are several standard formulations for level of association, of which the Pearson correlation coefficient is the most popular. The Pearson correlation coefficient, Data and Statistics Overview r cov(X ,Y ) xy (6.31) measures the level of linear association between two random variables. Here, x and y are the standard deviations of X and Y, respectively. Besides the Pearson correlation coefficient, the Kendall’s coefficient of concordance and the Spearman’s rank correlation coefficient are also commonly used measures of association for numeric variables. The Pearson’s coefficient of mean square contingency and the Cramer ’s contingency coefficient are used to measure association between nominal variables. The Kendall-Stuart c, the Goodman–Kruskal , and Sommer ’s d are used to measure association between ordinal variables (Liebetrau, 1983). Kendall’s coefficient Two pairs of observations (Xi,Yi) and (Xj,Yj) are said to be concordant if (XiXj)(YiYj) 0. They are said to be discordant if (XiXj)(YiYj) 0. They are said to be tied if (XiXj)(YiYj) 0. Kendall’s coefficient of concordance is defined as where c d (6.32) d is the probability of concordance c and c P[(Xi X j )(Yi Yj ) 0] is the probability of discordance d P[(Xi X j )(Yi Yj ) 0] Given the probability of ties, t P[(Xi X j )(Yi Yj ) 0] the following condition must be satisfied c d t 1 The following are two alternative estimations of the Kendall coefficient of concordance (Liebetrau, 1983). (C D) 2(C D) ⎛ n ⎞⎟ n(n 1) ⎜⎜ ⎟ ⎜⎝ 2 ⎟⎠ (6.33) 127 128 Data Mining and Market Intelligence where C is the number of concordant pairs, D is the number of discordant pairs, and n is the total number of pairs. An alternative expression (Liebetrau, 1983) is ( C D) ⎫ ⎧⎪ ⎡⎛ n ⎞ ⎪⎨ ⎢⎜ ⎟⎟ U ⎤⎥ ⎡⎢⎛⎜ n ⎞⎟⎟ V ⎤⎥ ⎪⎪⎬ ⎜ ⎜ ⎥⎦ ⎪⎪ ⎥⎦ ⎢⎣⎝⎜ 2 ⎠⎟ ⎪⎪⎩ ⎢⎣⎝⎜ 2 ⎠⎟ ⎭ 1 (6.34) 2 ⎛n ⎞ ⎛m ⎞ where U ∑ ⎜⎜⎜ i ⎟⎟⎟, V ∑ ⎜⎜ j ⎟⎟⎟ , and mi is the number of observations ⎜2 ⎠ ⎝2 ⎠ i j ⎝ in the ith set of the X variables, and nj is the number of observations in the jth set of the Y variables. Spearman’s rank correlation coefficient is similar to Pearson’s except that the former is based on ranks rather than on values. Ranks are determined by the relative values of the numbers in a series. In a series of N numbers, the largest number has a rank of one, the second largest number has a rank of two, and the smallest number has a rank of N. ∑ i1 (Ri R)(Si S ) n b {∑ n (Ri i1 R)2 ∑ i1 (Si S )2 n } 1 2 (6.35) Where Ri is the rank of Xi among the X s, Si is the rank of Yi among the Y s, and n is the total number of pairs. Both Pearson’s coefficient of mean square contingency and Cramer ’s contingency coefficient are based on the following chi-square statistics of a contingency table. I J x2 ∑ ∑ i1 j1 (nij npij )2 npij (6.36) Contingency tables display frequency data in a two-by-two cross tabulation and are used by researchers to examine the independence of two methods of classifying the data. For instance, a group of individuals can be classified by whether they are married and whether they are employed. In this case marriage and employment are the two methods of classifying the individuals. Figure 6-1 shows a typical contingency table. Pearson’s coefficient of mean square contingency is a statistic used to measure the deviation of the realized counts from the expected counts for determining the independence of the two classification methods. The formula for the Pearson’s coefficient is as follows (Liebetrau 1983). Data and Statistics Overview Y 1 2 1 2 n11 . . . J Totals n12 n1j n1 p11 p12 p1j p1 n21 n22 n2j n2 p21 p22 p2j p2 . X . . I Totals ni1 ni2 nij ni pi1 pi2 pij pi n1 n2 nj n n p1 p2 pj p 1 Figure 6-1 Visualization of a two-way contingency table (Liebetrau 1983). I J 2 ∑ ∑ i1 j1 ( pij pi p+ j )2 pi+ p+ j I J ∑∑ i1 j1 pij 2 pi pj 1 (6.37) Cramer ’s contingency coefficient, given by Eq. (6.38), measures the association between two variables as a percentage of their maximum possible variation. ⎛ φ 2 ⎞⎟1 / 2 ⎟ ν ⎜⎜⎜ , ⎜⎝ q 1 ⎟⎟⎠ (6.38) where q is min {I, J}. The Kendall-Stuart c, the Goodman–Kruskal , and Somers’ d are statistics that are derived from Kendall’s . 129 130 Data Mining and Market Intelligence The Kendall-Stuart c equals the excess of concordant over discordant pairs times another term representing an adjustment for the size of the table. Goodman and Kruskal’s gamma is a symmetric statistic that ranges from 1 to 1, based on the difference between concordant pairs and discordant pairs. Somers’ d is Goodman and Kruskal’s gamma modified to penalize for pairs tied only on X. Kendall-Stuart coefficient c 2q(C D) 2(C D) 2 [n2 (q 1)/ q] n (q 1) (6.39) Goodman–Kruskal’s coefficient c 1 d t c d 1 ∑ I i1 pi2 ∑ c d J p 2 j1 j ∑ i1 ∑ j1 pij 2 I J (6.40) Somers’ coefficient dY ,X d 1 tXY c X t c c d d Y t c 1 ∑ I i1 d pi2 (6.41) Here, tX is the probability that two randomly selected observations are tied only on X, and tXY is the probability that two randomly selected observations are tied on both X and Y. Tests of significance A significance test quantifies the statistical significance of hypotheses. We will follow the paradigm established by Neyman-Pearson to posit significance tests. In establishing a significance test, the probability distributions are grouped into two aggregates, one of which is called the null hypothesis, denoted by H0, and the other of which is called the alternative hypothesis, denoted as HA (Rice 1988). Null hypotheses often specify, or partially specify, the value of a probability distribution (Rice 1988). The acceptance area is the area under the probability density curve of the distribution specified by the null hypothesis. The rejection area is the area under the probability density curve of the distribution specified by the alternative hypothesis. There are two types of significance tests: one-tailed and two-tailed. A one-tailed test is one that specifies the rejection area under the only tail Data and Statistics Overview of the probability density curve of the test statistics. A two-tailed test is one that specifies the rejection area under the two tails of probability density curve of the distribution of the test statistics (Mendenhall and Beaver 1991). For instance, a null hypothesis may state that the probability of getting ten successes in one hundred trials is 0.1. The alternative hypothesis in a one-tailed test may state that the probability of getting ten successes in one hundred trials is less than 0.1. A two-sided alternative hypothesis may state that the probability of getting ten successes in one hundred trails is less than 0.1 or greater than 0.1. According to the Neyman-Pearson paradigm, a decision as to whether or not to reject H0 in favor of HA is made on the basis of T(X), where X denotes the sample values and T(X) is a suitable statistic (Rice 1988). This decision is affected by the error tolerance, which is defined by either error of type I or error of type II. A type I error consists in rejecting H0 when it is true. The probability of rejecting H0 when it is true is denoted as , called the significance level. In a one-tailed test, the probability of T(X) exceeding the critical statistic T*(X) is . The confidence level is (1 ) and the 100(1 ) percent confidence interval is T(X) T*(X). In a two-tailed test, the probability of T(X) exceeding critical T*(X) is /2 and the probability of T(X) not exceeding critical T*(X) is /2 . A type II error occurs when we accept H0 when it is false. The probability of accepting H0 when it is false is denoted as . The power of a test is 1. Z Test In a Z test, the test statistic T(X) is defined as follows, Z X (6.42) where is the mean of X, and is the standard deviation of X, where X is assumed to follow a normal distribution. A Z table shows the critical Z scores at a pre-specified significance level . If we assume a pre-specified of 0.05 and a two-tailed test, then the shaded area under the probability density curve is 0.5 subtracted by /2 . This area, 0.475 (or 47.5%), corresponds to a Z score of 1.96, as shown in Figure 6-2. t Test In a t test, T(X) is called a t score and is a statistic used to determine whether a null hypothesis can be rejected. When we conduct a t test, we assume that the data has a Student’s t distribution. The t score is given by 131 132 Data Mining and Market Intelligence 0 z 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 0.00 0.0000 0.0398 0.0793 0.1179 0.1554 0.1915 0.2257 0.2580 0.2881 0.3159 0.3413 0.3643 0.3849 0.4032 0.4192 0.4332 0.4452 0.4554 0.4641 0.4713 0.4772 0.4821 0.4861 0.4893 0.4918 0.4938 0.4953 0.4965 0.4974 0.4981 0.4987 0.01 0.0040 0.0438 0.0832 0.1217 0.1591 0.1950 0.2291 0.2611 0.2910 0.3186 0.3438 0.3665 0.3869 0.4049 0.4207 0.4345 0.4463 0.4564 0.4649 0.4719 0.4778 0.4826 0.4864 0.4896 0.4920 0.4940 0.4955 0.4966 0.4975 0.4982 0.4987 0.02 0.0080 0.0478 0.0871 0.1255 0.1628 0.1985 0.2324 0.2642 0.2939 0.3212 0.3461 0.3686 0.3888 0.4066 0.4222 0.4357 0.4474 0.4573 0.4656 0.4726 0.4783 0.4830 0.4868 0.4898 0.4922 0.4941 0.4956 0.4967 0.4976 0.4982 0.4987 0.03 0.0120 0.0517 0.0910 0.1293 0.1664 0.2019 0.2357 0.2673 0.2967 0.3238 0.3485 0.3708 0.3907 0.4082 0.4236 0.4370 0.4484 0.4582 0.4664 0.4732 0.4788 0.4834 0.4871 0.4901 0.4925 0.4943 0.4957 0.4968 0.4977 0.4983 0.4988 0.04 0.0160 0.0557 0.0948 0.1331 0.1700 0.2054 0.2389 0.2704 0.2995 0.3264 0.3508 0.3729 0.3925 0.4099 0.4251 0.4382 0.4495 0.4591 0.4671 0.4738 0.4793 0.4838 0.4875 0.4904 0.4927 0.4945 0.4959 0.4969 0.4977 0.4984 0.4988 0.05 0.0199 0.0596 0.0987 0.1368 0.1736 0.2088 0.2422 0.2734 0.3023 0.3289 0.3531 0.3749 0.3944 0.4115 0.4265 0.4394 0.4505 0.4599 0.4678 0.4744 0.4798 0.4842 0.4878 0.4906 0.4929 0.4946 0.4960 0.4970 0.4978 0.4984 0.4989 0.06 0.0239 0.0636 0.1026 0.1406 0.1772 0.2123 0.2454 0.2764 0.3051 0.3315 0.3554 0.3770 0.3962 0.4131 0.4279 0.4406 0.4515 0.4608 0.4686 0.4750 0.4803 0.4846 0.4881 0.4909 0.4931 0.4948 0.4961 0.4971 0.4979 0.4985 0.4989 0.07 0.0279 0.0675 0.1064 0.1443 0.1808 0.2157 0.2486 0.2794 0.3078 0.3340 0.3577 0.3790 0.3980 0.4147 0.4292 0.4418 0.4525 0.4616 0.4693 0.4756 0.4808 0.4850 0.4884 0.4911 0.4932 0.4949 0.4962 0.4972 0.4979 0.4985 0.4989 0.08 0.0319 0.0714 0.1103 0.1480 0.1844 0.2190 0.2517 0.2823 0.3106 0.3365 0.3599 0.3810 0.3997 0.4162 0.4306 0.4429 0.4535 0.4625 0.4699 0.4761 0.4812 0.4854 0.4887 0.4913 0.4934 0.4951 0.4963 0.4973 0.4980 0.4986 0.4990 0.09 0.0359 0.0753 0.1141 0.1517 0.1879 0.2224 0.2549 0.2852 0.3133 0.3389 0.3621 0.3830 0.4015 0.4177 0.4319 0.4441 0.4545 0.4633 0.4706 0.4767 0.4817 0.4857 0.4890 0.4916 0.4936 0.4952 0.4964 0.4974 0.4981 0.4986 0.4990 Figure 6-2 Z table. t XX , s (6.43) where X is the mean of X and s is the standard deviation of X. A t table shows the critical t score at a prespecified significance level , parametrically in the number of degrees of freedom. If we assume a prespecified of 0.05 and a two-tailed test, then the shaded area under the Data and Statistics Overview probability density curve is 0.5 subtracted by /2 . To identify the critical t value, we need two pieces of information, the degrees of freedom, n1, and the pre-specified significance level . If we assume that the total number of observations in the sample is 30 and is 0.05, the number of degrees of freedom of the sample is 29 and /2 is 0.025. With this information, in Figure 6-3 we identify the critical t value as 2.0452. t(p,df) df \p 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 inf Figure 6-3 t table. 0.4 0.3249 0.2887 0.2767 0.2707 0.2672 0.2648 0.2632 0.2619 0.2610 0.2602 0.2596 0.2590 0.2586 0.2582 0.2579 0.2576 0.2573 0.2571 0.2569 0.2567 0.2566 0.2564 0.2563 0.2562 0.2561 0.2560 0.2559 0.2558 0.2557 0.2556 0.2533 0.25 1.0000 0.8165 0.7649 0.7407 0.7267 0.7176 0.7111 0.7064 0.7027 0.6998 0.6974 0.6955 0.6938 0.6924 0.6912 0.6901 0.6892 0.6884 0.6876 0.6870 0.6864 0.6858 0.6853 0.6849 0.6844 0.6840 0.6837 0.6834 0.6830 0.6828 0.6745 0.1 3.0777 1.8856 1.6377 1.5332 1.4759 1.4398 1.4149 1.3968 1.3830 1.3722 1.3634 1.3562 1.3502 1.3450 1.3406 1.3368 1.3334 1.3304 1.3277 1.3253 1.3232 1.3212 1.3195 1.3178 1.3163 1.3150 1.3137 1.3125 1.3114 1.3104 1.2816 0.05 6.3138 2.9200 2.3534 2.1318 2.0150 1.9432 1.8946 1.8595 1.8331 1.8125 1.7959 1.7823 1.7709 1.7613 1.7531 1.7459 1.7396 1.7341 1.7291 1.7247 1.7207 1.7171 1.7139 1.7109 1.7081 1.7056 1.7033 1.7011 1.6991 1.6973 1.6449 0.025 12.7062 4.3027 3.1825 2.7765 2.5706 2.4469 2.3646 2.3060 2.2622 2.2281 2.2010 2.1788 2.1604 2.1448 2.1315 2.1199 2.1098 2.1009 2.0930 2.0860 2.0796 2.0739 2.0687 2.0639 2.0595 2.0555 2.0518 2.0484 2.0452 2.0423 1.9600 0.01 31.8205 6.9646 4.5407 3.7470 3.3649 3.1427 2.9980 2.8965 2.8214 2.7638 2.7181 2.6810 2.6503 2.6245 2.6025 2.5835 2.5669 2.5524 2.5395 2.5280 2.5177 2.5083 2.4999 2.4922 2.4851 2.4786 2.4727 2.4671 2.4620 2.4573 2.3264 0.005 63.6567 9.9248 5.8409 4.6041 4.0321 3.7074 3.4995 3.3554 3.2498 3.1693 3.1058 3.0545 3.0123 2.9768 2.9467 2.9208 2.8982 2.8784 2.8609 2.8453 2.8314 2.8188 2.8073 2.7969 2.7874 2.7787 2.7707 2.7633 2.7564 2.7500 2.5758 0.0005 636.6192 31.5991 12.9240 8.6103 6.8688 5.9588 5.4079 5.0413 4.7809 4.5869 4.4370 4.3178 4.2208 4.1405 4.0728 4.0150 3.9651 3.9216 3.8834 3.8495 3.8193 3.7921 3.7676 3.7454 3.7251 3.7066 3.6896 3.6739 3.6594 3.6460 3.2905 133 134 Data Mining and Market Intelligence Experimental design The main objective of experimental design is to ensure the validity of the conclusions from a study or survey. Experimental design is used to avoid study or survey design flaws that may skew the results. An experimental design is a process that seeks to discover the functional forms that relate the independent (predictive) variables and the dependent (outcome) variables in a study (Green, Tull, and Albaum 1988). Depending on the level of information available, an experimental design aims to accomplish any of the following tasks. ● ● ● Getting numeric parameter estimates only if the statistical function form is already known Building a model if the statistical function form is unknown Identifying relevant variables (independent and dependent) if the statistical functional form is known but the variables are unknown. Experimental design terminology The following is a list of frequently used terms in experimental design (Green, Tull, and Albaum 1988). ● ● ● ● ● ● ● ● ● Units: Units are individuals, subjects or objects. Treatments: Treatments are the independent (or predictive) variables in an experimental design, calibrated to observe potential causality. Control units: These are objects, individuals, or subjects that are not subjected to any treatment. A group that consists of control units is called a control group. Test units: These are objects, subjects, or individuals that are given a particular treatment. A group that consists of test units is a treatment or test group. Natural experiment: An experiment that requires minimum intervention and no calibration of variables. Controlled experiment: An experiment that requires an investigator ’s intervention and calibration of variables to discover a causality effect. Two kinds of interventions are necessary: random placement of subjects into a control or a treatment group, and calibration of at least one assumed to be a causal variable. Experiments that meet both intervention conditions are called true experiments. Quasi-experiment: Experiments that contain manipulation of at least one assumed causal variables, but do not have random assignment of subjects into control or experiment group. Block: A block is a group of similar units of which roughly equal numbers of units are assigned to each treatment group. Replication: Replication is the creation of repeated measurements in a control or treatment group. Data and Statistics Overview ● ● ● ● ● Completely randomized design: This is a design where test units are assigned experimental treatments on a random basis. Full factorial design: This type of design assigns an equal number of observations to all combinations of the treatments involving at least two levels of at least two variables. Latin square design: Latin square design is a technique for reducing the number of observations required in a full factorial design. Cross-over design: In this design, different treatments are applied to the same test unit in different periods of time. Randomized-block design: This design is usually used when a researcher needs to eliminate a possible source of error. In this case, each test unit is regarded as a ‘block’ and all treatments are applied to each of these blocks. ■ References Green, P.E., D.S. Tull, and G. Albaum. Research for Marketing Decisions, 5th ed. Prentice Hall, Englewood Cliffs, New Jersey, 1988. Gujarati, D.N. Basic Econometrics, 2nd ed. McGraw-Hill, New York, 1988. Liebetrau, A.M. Measures of Association. Quantitative Applications in the Social Sciences, Sage Publications, Thousand Oaks, CA, 1983. Mead, R. The Design of Experiments – Statistical Principles for Practical Application. Cambridge University Press, New York, 1988. Mendenhall, W., and R. Beaver. Introduction to Probability and Statistics, 8th ed. PWS-Kent Publishing Company, Boston, MA, 1991. Rice, J.A. Mathematical Statistics and Data Analytics, Statistics/Probability Series. Wadsworth & Brooks/Cole, Belmont, CA, 1988. 135 This page intentionally left blank CHAPTER 7 Introduction to Data Mining This page intentionally left blank A wide variety of data mining approaches have been developed to address a broad spectrum of business problems. Techniques such as logistic regressions are used for building targeting models, and approaches such as association analysis are used for building cross-sell or up-sell models. Effective use of data mining to identify potential revenue opportunities along the sales pipeline may result in higher returns on investment and the creation of a competitive advantage. The objective of this chapter is to introduce the fundamentals of the most commonly used data mining techniques. Chapters 8–10 discuss several case studies based on some of these techniques. ■ Data mining overview We define data mining in terms of ● ● The use of statistical or other analytical techniques to process and analyze raw data to find meaningful patterns and trends The extraction and use of meaningful information and insight to produce actionable business recommendations and decisions The focal point of effective data mining is to analyze data in order to make actionable business recommendations. Without the latter, data mining is an intellectual exercise with no real life application. In our experience, insufficient focus on actionable business recommendations is often the main reason that data mining may not have been as widely adopted by some organizations as would have been desirable. Data mining can be applied to the solution of a broad range of business problems. The following is a list of standard applications of data mining techniques. ● Development of customer segmentation Customer segmentation and profiling analysis constitutes the first step toward understanding the target audience. Understanding of the target audience drives effective advertising, offers, and messaging. As we pointed out in Chapter 5, marketing plan objectives determine a variety of segmentation types. For example, if the marketing plan calls for the creation of segments with differentiated needs, then needbased segmentation is required. Common segmentation types are: – Need-based segmentation – Demographics-based segmentation – Value-based segmentation – Product purchase-based segmentation – Profitability-based segmentation 140 Data Mining and Market Intelligence ● ● ● ● ● ● ● ● ● Customer profiling Profiling analysis creates descriptions of segments by their unique characteristics and attributes. For instance, a segment profile may consist of attributes such as an age range of thirty-five and older, and an annual household income of $75,000 or higher. New customer acquisition New customer acquisition can be costly. Predictive targeting models built with data mining techniques allow us to effectively target prospects with the greatest propensity of converting to customers. Minimization of customer attrition or churn Attrition among existing customers results in immediate revenue loss. Data mining can be used to predict future attrition and mitigate customer defection by understanding the factors responsible for attrition. Maximization of conversion Increase in conversions of responders to leads and leads to buyers can expedite the sales process. High conversion is one of the keys to high investment returns. Data mining can help understand the primary conversion drivers. Cross-selling and up-selling of products Experience shows that it is much more expensive to generate revenues from new customers than from existing customers. The main reason is that building relationships and trust is costly and time-consuming. Data mining can be used to build models for quantifying additional products and services sales to existing customers. Personalization of messages and offers Personalized messages and offers tend to solicit higher response rates from the target audience than generic ones. Data mining techniques such as collaborative filtering can be used to create real-time, personalized offers and messages. Inventory optimization Data mining facilitates the determination of more accurate forecasts of inventory needs, avoiding unnecessary waste due to over or under stockpiling of inventory. Forecasting marketing program performance It is a common need of firms to forecast revenues, responses, leads, and web traffic. Data mining techniques such as time series and multivariate regression analyses can be applied to address such forecasting needs. Fraud detection The federal government of the United States was an early adopter of data mining technology. As part of the investigation of the Oklahoma bombing crime in 1995, the FBI used data mining analysis to sift through thousands of reports submitted by agents in the field looking Introduction to Data Mining for connections and possible leads (Berry and Linoff 1997). In the credit industry, customer fraud can cause significant financial damage to lenders. Predictive modeling can be applied to address this issue by modeling the probability of fraud at an individual level. ■ An effective step by step data mining thought process Figure 7-1 illustrates a step-by-step thought process for data mining. Step 1 Identify business objectives and goals Step 5 Identify data sources Step 2 Step 3 Determine key business areas and metrics to focus Translate business issues to technical problems Step 6 Perform analysis Step 4 Select appropriate data mining techniques and software tools Step 7 Translate Analytical results into actionable business recommendations Figure 7-1 Effective data mining thought process. Step one: identification of business objectives and goals The first step of the process is to identify the objectives and the goals of a marketing effort. Objectives are something defined at a more abstract level and in a less quantitative manner than goals, which are usually quantifiable. For example, a business objective may be to increase sales of the current fiscal year and the goal may be to increase the sales of the 141 142 Data Mining and Market Intelligence current fiscal year by 15% given the same amount of investment as that of the last fiscal year. Step two: determination of the key focus business areas and metrics Once the objectives and goals of a marketing effort have been identified, the next step is to determine on which business areas to focus and what metrics to use for measuring returns. For instance, incremental marketing returns may come from the existing customer base, from new customers, from an increase in the efficiency of marketing operations, or from a reduction in the number of fraud cases. Consider an online publisher whose main revenue source is advertising. Advertising revenue is based on cost per thousand impressions (CPM). The cost of having an ad exposed to the one million impressions is $1000 at a CPM of $1. Assume the publisher has as an objective to increase traffic (impressions) to his web site and a goal of increasing his advertising revenue by 15% over the next three months. There are several business areas that the publisher can focus on to accomplish his goal. The following are four examples of marketing efforts that the publisher may consider. ● ● ● ● An increase in his investment in search marketing to drive traffic to his site Advertisement of his site via online banners on others’ sites Launching a promotional activity such as a sweepstake on the radio to drive traffic to his site Deployment of an online blog on his site to increase traffic volume. The choices are numerous and the publisher needs to select a main focus by assessing the advantages and disadvantages of each option. The metrics used to measure the success of a marketing effort need to be consistent with the business goals. In this example, if the goal is to increase advertising revenue by 15% over the next three months then the appropriate return metric is clearly the advertising revenue. Step three: translation of business issues into technical problems The third step in an effective data mining thought process is the translation of business issues into technical problems. Wrong translation will lead to waste of resources and opportunities. Continuing with the online Introduction to Data Mining publisher example: If the focus business area is to advertise on other web sites to promote his own, the publisher needs to determine which of those sites are most appropriate. As an example of a technical answer to the question of where the publisher can place his advertisement, consider the following approach. The publisher can segment the traffic to his web site by referral sources. A referral site or source is the origination site that leads particular visitors to the publisher ’s site. Some visitors may have arrived at the publisher ’s site via a Google search or a Yahoo search. In this case, either the Google or the Yahoo search is the referral source. The publisher can then determine which of the referral sources are the best traffic sources based on traffic volume, traffic growth, visitor profile, and cost. He can then emphasize investment in the most effective referral sources and actively look for new referral sources with similar characteristics. All of the above analysis requires a web analytic tool. Therefore, expertise in the selection, deployment, and creation of reports from a web analytic tool needs to be brought into the analysis process. Step four: selection of appropriate data mining techniques and software tools Data mining techniques are based on analytic methods or algorithms, such as logistic regressions, and decision trees. Data mining software is an application that implements data mining techniques such that the user does not need to write the data mining algorithms he uses. SAS Enterprise Miner, IBM Intelligent Miner, SPSS Clementine, and Knowledge STUDIO are examples of data mining software. Step five: identification of data sources Once the appropriate data mining technique and software have been established, proper data sources need to be identified to effectively leverage data mining. For example, historical customer purchase data is required for conducting cross-sell or up-sell analysis. The following is a list of common data sources for data mining (Rud 2001). ● Internal sources – Customer databases – Transaction databases – Marketing history databases – Solicitation mail and phone tapes – Data warehouse 143 144 Data Mining and Market Intelligence ● External sources – Third-party list rentals – Third-party data appends Additional data sources are: ● ● ● ● ● ● ● ● ● ● ● ● Enterprise resource planning (ERP) systems Point of sales (POS) systems Financial databases Customer relationship management (CRM) systems Supply chain management (SCM) systems, such as SAP and People Soft Marketing research and intelligence databases Campaign management systems Advertising servers E-mail delivery systems Web analytic systems Web log files Call center systems After the appropriate data sources are identified, it is essential for make sure that the data is cleansed and standardized in preparation for data mining analysis. Step six: conduction of analysis There are three stages in data mining analysis: modeling building, model validation, and real life testing. A comprehensive analysis must include all three stages. Skipping the model validation and real life testing stages may increase the risk of rolling out unstable models. The data needs to be divided into two subsets, one for model building and the other for validation. Model building A subset of the available data is used for model building. A common practice is to use 50–70% of the data for model building and the remaining data for model validation. In general, several models are built and the best ones are chosen based on statistics measuring model effectiveness. If the R-squared (R2) statistic is used to evaluate the effectiveness of multiple regression models, then the model with the highest R2 is selected. Model validation After a subset of the data has been used for model building, the remaining data is used for validation. One common mistake is to build models and validate them on the same set of data. This is a serious error that can Introduction to Data Mining artificially inflate the power of the models and make validation results look much better than they actually are. It is very important to conduct out-of-sample (sample referring to the subset of data used for model building) validation. If a model works well with the validation test, it will likely be successful in real-life tests. Real-life testing The best way to tell if a model works is to try it out with a small-scale real life test. To test a targeting model, marketing promotions need to be sent to both a control group and a test group. A control group consists of a random selection of all available prospects. A test group consists of the prospects that the model predicts are more likely to respond. A comparison in the response rates between the test group and the control group provides insights into the robustness of a targeting model. If the test group has a higher response rate than that of a control and the result is statistically significant, then the model is robust. If the test fails, close examination of the model needs to be conducted to understand why the model does not work in a real-life situation. Real-life testing is a roll out prerequisite. Step seven: translation of analytical results into actionable business recommendations Inferring actionable business recommendations from model results requires explaining the main conclusions of the analysis in nontechnical terms. Throughout the various case studies from Chapter 8 to Chapter 10, we provide specific examples on how to translate analytical results into actionable business recommendations. ■ Overview of data mining techniques The foundations for the development of different data mining techniques are statistical objectives and available data types. There are two common statistical objectives, analysis of dependence and analysis of interdependence (Dillon and Goldstein 1984): ● ● Analysis of dependence: This type of analysis is used to explore relationships between dependent variables and independent variables. Analysis of interdependence: This method is used to explore relationships among independent variables. The two common underlying data types of independent and dependent variables are metric and nonmetric. 145 146 Data Mining and Market Intelligence Basic data exploration A preliminary data exploration is required before building any sophisticated data mining models. Occasionally, a basic data exploration is sufficient to address a business question. For instance, by regarding web traffic as a time series, we may be able to identify visible spikes in traffic pattern that coincide with particular marketing activities. By plotting one customer attribute such as income against customer behavior such as grocery purchases, we might spot a distinct pattern of correlation between income and grocery shopping. Data exploration should always be the first step before building any model. Variables that appear interesting and relevant at the data exploration stage will likely show up as significant contributors in the final model. Linear regression analysis Regression analysis is a technique for quantifying the dependence between dependent and independent variables. A particular type of regression analysis, linear regression, is most frequently used in data mining. In its more general formulation, linear regression establishes a linear relationship between the dependent variables and the so-called regression parameters, with the independent variables appearing in nonlinear functional forms. A particularly popular form of linear regression occurs when the relationship between dependent and independent variables is itself linear. In this case, linear regression with a single independent variable is called simple linear regression, and regression with several independent variables is called multiple linear regression. The linear technique is widely used to predict a single dependent variable (outcome variable) with one or multiple independent variables (predictive variables). Linear regression problems are addressed with Ordinary Least Squares (OLS) and Maximum Likelihood Estimation (MLE). Simple linear regression In simple linear regression, there is a single dependent variable and a single independent variable. There is an implied linear relationship between the two variables. Figure 7-2 is a graphical representation of this relationship. The mathematical formula for simple regression is Yi 0 1 Xi i (7.1) where i 1, 2, 3, 4, …., n, Xi is the value of independent variable X, and Yi is the value of dependent variable Y. 0 and 1 are regression parameters and i is the error term. The number of degrees of freedom is n 1. Introduction to Data Mining Y dy (xi,yi) dx εi dy β1 dx (0,β0) X Figure 7-2 Illustrating the linear relationship between X and Y. The estimator of Yi is denoted as Ŷi and is called the regression line. If ˆ0 is the estimator of 0 and ˆ1 is the estimator of 1, the regression line is given by the following expression Ŷi ˆ0 ˆ1 Xi (7.2) The estimator of 1 is (Neter, Wasserman, and Kutner 1990) 1 in ∑ i1 (Xi X )(Yi Y ) in ∑ i1 (Xi X )2 (7.3) where X and Y are the means of X and Y, respectively. The variance of ˆ1 is var( 1 ) σ2 in ∑ i1 (Xi X )2 where is the variance of variable Y. (7.4) 147 148 Data Mining and Market Intelligence The estimator of 0 can be shown to be (Neter, Wasserman, and Kutner 1990) ˆ0 Y ˆ1 X (7.5) The variance of ˆ0 is in s2 (ˆ0 ) 2 ∑ i1 Xi 2 (7.6) in n∑ i1 (Xi X )2 A significance test using t statistics can be applied to determine whether ˆ0 and ˆ1 are statistically significant. The t statistic for ˆ1 is t(ˆ0 ), expressed as ˆ0 / S(ˆ0 ). In the case where a 95% confidence level is used for the test, if t(ˆ0 ) is greater than t0.025 , n2 (ˆ0 ) or less than −t0.025 , n2 (ˆ0 ) , then ˆ is statistically significant. The 95% confidence 0 interval of ˆ0 is ⎡ in X ⎢ˆ ∑ i1 i ⎢ 0 t0.025 , n2 , ˆ0 t0.025 , n2 in ⎢ 2 ( ) n X X ∑ i1 i ⎢⎣ ⎤ in ⎥ ∑ i1 Xi ⎥ in n∑ i1 (Xi X )2 ⎥⎥ ⎦ The t statistic for ˆ1 is t(ˆ1) , expressed as t(ˆ1 ) ˆ1 s(ˆ1) (7.7) In the case where a 95% confidence level is used for the test, if t(ˆ1) is greater than t0.025 , n2 (ˆ1) or less than t0.025 , n2 (ˆ1) , then ˆ1 is statistically significant. The 95% confidence interval of ˆ is given by 1 ⎡ ⎛ ⎜ ⎢ ⎢ ⎜⎜⎜ t ⎢ 1 ⎜ 0.025 , n2 ⎜⎜ ⎢ ⎝ ⎢⎣ ⎞⎟ ⎛ ⎜⎜ ⎟⎟ ⎟⎟ , 1 ⎜⎜ t0.025 , n2 ⎜⎜ in ⎟ ⎜⎝ ∑ i1 (Xi X )2 ⎟⎟⎠ σ ⎞⎟⎤ ⎟⎟⎥⎥ ⎟⎟⎥ in 2⎟ ⎟⎟⎥ ( X X ) ∑ i1 i ⎠⎥⎦ Introduction to Data Mining Key assumptions of linear regression Simple linear regression relies on four key assumptions that need to be satisfied for conclusions to apply (Neter, Wasserman, and Kutner 1990). Assumption 1: The mean of error term i, conditional on Xi is zero. E( i | Xi ) 0 where i 1, 2, 3, 4, …, n. Assumption 2: The covariance between the error terms, i s, is zero. cov( i , j ) 0 where i j, i 1, 2, 3, 4, …, n, and j 1, 2, 3, 4, …, n. Assumption 3: The variance of i is constant (a situation referred to as homoscedasticity.) var( i ) i 2 j 2 var( j ) where i j, i 1, 2, 3, 4, …, n, and j 1, 2, 3, 4, …, n. Assumption 4: The covariance between Xi and i is zero, namely, cov(Xi, i) 0 where i 1, 2, 3, 4, …, n. Multiple linear regression In multiple linear regression, there is a single dependent variable and more than one independent variable. We can describe a multiple regression model with p 1 independent variables as follows. Yi 0 1 X1i 2 X 2i 3 X 3 i p1 X p1, i i (7.8) for i 1, 2, 3, 4, …, n, and p 3. We can also express a multiple regression equation in matrix form (Dillon and Goldstein 1984). ⎛ Y1 ⎞⎟ ⎛⎜ 0 1 X11 2 X 21 3 X 31 p1 X p1, 1 1 ⎞⎟ ⎟⎟ ⎜⎜ ⎟ ⎜ ⎟ ⎜⎜ Y ⎟⎟ ⎜⎜ X X X X 1 12 2 22 3 32 p1 p1, 2 2 ⎟ ⎟⎟ ⎜⎜ 2 ⎟⎟⎟ ⎜⎜ 0 ⎟⎟ . ⎜⎜ . ⎟⎟ ⎜⎜ ⎟⎟ ⎜⎜ . ⎟⎟ ⎜⎜ . ⎟⎟ ⎜⎜ . ⎟⎟ ⎜⎜ ⎟⎟ . ⎜⎜⎝ Y ⎟⎟⎠ ⎜⎜ X X X X ⎟ ⎜⎝ 0 n 1 1n 2 2n 3 3n p1 p1, n n ⎟ ⎠ (7.9) 149 150 Data Mining and Market Intelligence Equation 7.9 can be expanded to the following. ⎛ Y1 ⎞⎟ ⎛⎜⎜ 1 X11 ⎜⎜ ⎟ ⎜ ⎜⎜ Y ⎟⎟ ⎜⎜ 1 X 12 ⎜⎜ 2 ⎟⎟⎟ ⎜⎜ ⎜⎜ . ⎟⎟ = ⎜⎜ . . ⎜⎜ . ⎟⎟ ⎜⎜ . . ⎜⎜ . ⎟⎟ ⎜⎜ . . ⎜⎝⎜ Y ⎟⎟⎠ ⎜⎜ n ⎜⎝ 1 X1n X 21 X 31 . . . . . X 22 . . . X2n X 32 . . . X3 n . . . . . . . . . . . . . . . . . . . . . . . . . X p1, 1 ⎞⎟ ⎛ 0 ⎞ ⎛ ⎞ ⎟⎟ ⎜ 1 ⎟⎟ ⎟⎟ ⎜⎜ ⎟⎟ ⎜⎜ ⎟ ⎟ ⎜⎜ ⎟ X p1, 2 ⎟⎟ ⎜⎜ 1 ⎟⎟ ⎜⎜ 2 ⎟⎟⎟ ⎟ ⎜ ⎟ . ⎟⎟⎟ ⎜⎜ . ⎟⎟⎟ ⎜⎜ . ⎟⎟ ⎜ ⎟ ⎟⎟ ⎜⎜ . ⎟⎟⎟ . . ⎟⎟ ⎜⎜ ⎜ . ⎟⎟⎟ ⎜⎜ . ⎟⎟⎟ ⎜⎜ . ⎟⎟⎟ ⎜ ⎟ ⎟ ⎟ X p1, n ⎟⎠ ⎜⎝ p1 ⎠ ⎜⎝ n ⎟⎠ (7.10) In matrix form, this can be written as Y X (7.11) The estimated parameter matrix is (X ′X )1 X ′Y (7.12) where X and Y are matrices whose entries are the realized values of the corresponding random variables. The covariance matrix of the estimated matrix is s2 ( ′) 1 (Y ′Y ′X ′Y )(X ′X )1 n p (7.13) The standard deviation matrix of the estimated matrix is s( ) , a diagonal matrix whose entries are the square root of the diagonal matrix elements of s2 ( ) . The 95% confidence interval of the estimated matrix is as follows [ t0.025 , np1 s( ), t0.025 , np1 s( )] Goodness of fit measure R2 and the F statistic The term R2 is also called the multiple coefficient of determination. R2 measures the total variance of the sample data explained by the regression model, and is given by the ratio of the variance explained by the multiple regression (Sum of Squares of Regression, or SSR) and the total variance (Total Sum of Squares, or TSS). The values of R2 range from zero to one. The difference between TSS and SSR is called Sum of Squares of Errors, or SSE. The higher R2 is, the higher the fraction of the sample variance explained by the linear regression model. R2 is given by the expression R2 ∑ ( y y )2 ∑ ( y i y )2 SSR TSS SSE SSE 1 TSS TSS TSS (7.14) Introduction to Data Mining As the number of independent variables increases in a particular model, the coefficient of determination tends to increase. When an increase in R2 is due to the increase in the number of independent variable rather than the incremental explanatory power of the additional independent variables, the model power is inflated. To avoid adoption of a model with inflated explanatory power, the adjusted R2 can be used instead of R2. Adjusted R2 = 1 (1 R2 ) n1 n p1 (7.15) where p is the number of independent variables. The F statistic is another statistical measure of the robustness of multiple regression. The F statistic is obtained by dividing Mean Squared Regression (MSR) by Mean Squared Error (MSE) (Neter, Wasserman, and Kutner 1990). MSR is defined as SSR divided by the degrees of freedom, p. MSE is SSE divided by its degrees of freedom, n p 1. F SSR/ p MSR MSE SSE /(n p 1) (7.16) We reject the null hypothesis that all the parameter estimates are zeros if the F statistic of a multiple regression model is greater than the critical F value, F1– ,p,n–p–1 where is the significance level. Additional regression techniques have been developed over the years to facilitate selection of independent variables. These techniques include backward, forward, and stepwise selection methods. Chapter 12 of the book Applied Statistical Methods (Neter, Wasserman, and Kutner 1990) has an in-depth discussion of these methods. Cluster analysis Cluster analysis is used to uncover interdependence between members of a sample. In this context, by members we mean objects of study, such as individuals or products, and by sample we mean the collection of such individuals or products used for conducting the study. By member attributes, we mean variables that describe the features and characteristics of members. For instance, age and income are member attributes of individuals. Through cluster analysis, members with similar values in the variables under analysis are grouped into clusters. Each member can only belong to one cluster. Cluster analysis is widely used in customer segmentation. Identification of members with similar characteristics is the key to cluster analysis. The following section provides an overview on how to measure similarity between members of a sample. 151 152 Data Mining and Market Intelligence Measurement of similarity between sample members Comprehension of the concept of similarity is the key to the understanding of cluster analysis. Various criteria, such as distance and correlation, can be used to measure similarity between sample members. In this chapter, we will focus on distance as a similarity measure of members. The Euclidean distance (Dillon and Goldstein 1984) measures the distance between two sample members, i and j, with p attributes. The shorter the distance is, the more similar the two members are to each other. If we assume that the realized values of the attributes of a sample member i can be represented by vector Xi = (X1i , X 2i , X 3 i , … , X pi ) and the values of the attributes of member j can be represented by a vector X j = (X1 j , X 2 j , X 3 j , … , X pj ) , then the Euclidean distance between the two members is d= kp ∑ (Xki Xkj )2 (7.17) k1 The Mahalanobis distance (Dillon and Goldstein 1984) is another method of measuring distances between members, and has the advantage over the Euclidean distance in that it takes into consideration correlation between the attributes of the members. The Mahalanobis distance, m, is defined by m = (Xi X j )′S1 (Xi X j ) (7.18) where S is the covariance matrix of the member attributes. Clustering techniques comprise hierarchical and partitioning methods (Dillon and Goldstein 1984). Hierarchical methods, in their turn, can be classified as either agglomerative or divisive. Agglomerative methods start out by treating each member as a cluster and then grouping members with particular similarities into clusters. Once a member is assigned to a cluster, it cannot be reassigned to another cluster. Divisive methods, commonly known as decision trees, begin by splitting the members into two or more groups. Each group can be further split into two or more subgroups, with the splitting process continuing until a preselected statistic reaches an assumed critical value. Hierarchical agglomerative methods There are four common approaches under hierarchical agglomerative methods (Dillon and Goldstein 1984). ● ● ● ● Nearest neighbor (single linkage) Furthest neighbor (complete linkage) Average linkage Ward’s error of sum of squares. Introduction to Data Mining The nearest-neighbor approach defines the distance between a member and a cluster of members as the shortest distance between the member and the cluster. The furthest neighbor approach defines the distance between a member and a cluster of members as the longest distance between the member and the cluster. The average linkage approach defines the distance between a member and a cluster of members as the average distance between the member and the cluster. This distance can be any statistical distance measure, such as Euclidean and Mahalanobis. The three approaches share the common rule that members and clusters that are close to one another are grouped into large clusters. The nearest-neighbor approach begins by grouping the two members with the shortest distance into a cluster. The approach next calculates the distance between this cluster and each of the remaining members and continues to group members and clusters that are closest to one another. The process continues until a cluster containing all members is formed. We next discuss an example of the nearest-neighbor method applied to a set of five members (A, B, C, D, and E). Figure 7-3 shows a matrix whose entries represent the distances between any two members. A B C D E A 0 4 65 12 9 B 4 0 45 34 10 C 65 45 0 17 22 D 12 34 17 0 12 E 9 10 22 12 0 Figure 7-3 Distance matrix illustration. As we can see in Figure 7-3, the distance between members A and B is the shortest one. Therefore, we start out by grouping these two members into the first cluster. After members A and B are grouped into one cluster, we calculate the distances between this cluster and the remaining members, as shown in Figure 7-4. 153 154 Data Mining and Market Intelligence A B C D E A 0 0 65 12 9 B 0 0 45 34 10 C 65 45 0 17 22 D 12 34 17 0 12 E 9 10 22 12 0 Figure 7-4 Distances after formation of the first cluster. The nearest distance between the first cluster (AB) and each of the remaining members and the distances between the remaining members are as follows (AB) and C: (AB) and D: (AB) and E: C and D: C and E: D and E: min(65, 45) 45 min(12, 34) 12 min(9, 10) 9 17 22 12 From this we infer that A, B, and E now form a cluster since nine is the shortest distance. The distance between this new cluster and the remaining members are. (ABE) and C: (ABE) and D: C and D: min(65, 45, 22) 22 min(12, 34, 12) 12 17 Since 12 is the shortest distance, this means that A, B, E, and D now form a cluster and C remains as a cluster of its own. In conclusion, the nearest-neighbor method has created two final clusters. One cluster consists of members A, B, D, and E, and the other cluster contains member C only. Introduction to Data Mining We next apply the furthest neighbor approach to the same set of members. The distance between members A and B is the shortest so these two are grouped into the first cluster from the outset. After members A and B are grouped into one cluster, the distances between the cluster and the remaining members are calculated. (AB) and C: (AB) and D: (AB) and E: C and D: C and E: D and E: max(65, 45) 65 max(12, 34) 34 max (9, 10) 10 17 22 12 From this we infer that A, B, and E now form a cluster since ten is the shortest distance. The distance between the cluster and the remaining members are (ABE) and C: (ABE) and D: C and D: max(65, 45, 22) 65 max(12, 34, 12) 34 17 This means that C and D form a cluster since seventeen is the shortest distance. In conclusion, the furthest neighbor method has created two final clusters. One cluster consists of members A, B, and E, and the other cluster contains members C and D. We next apply the average linkage approach to the same member sample. Since the distance between members A and B is the shortest, these two members are grouped into the first cluster from the outset. After members A and B are grouped into one cluster, the distances between the cluster and the remaining members are calculated and the results are as follows. (AB) and C: (AB) and D: (AB) and E: C and D: C and E: D and E: Average(65, 45) 55 Average(12, 34) 23 Average(9, 10) 9.5 17 22 12 From this we conclude that A, B, and E now form a cluster since 9.5 is the shortest distance. The distances between the cluster and the remaining members are (ABE) and C: (ABE) and D: C and D: Average(65, 45, 22) 44 Average(12, 34, 12) 19.3 17 155 156 Data Mining and Market Intelligence Members C and D form a cluster since seventeen is the shortest distance. Members A, B, and E form the other cluster. Ward’s Error Sum of Squares (Ward’s ESS) is a clustering approach that creates clusters by minimizing the sum of the within-cluster variance. The within-cluster variance is defined as (Dillon and Goldstein 1984) ⎛ nj ⎞⎟2 ⎞⎟⎟ 1 ⎜ ⎜ ESS ∑ ⎜⎜ ∑ X 2ij ⎜⎜ ∑ Xij ⎟⎟⎟ ⎟⎟⎟ ⎟⎠ ⎟⎟ n j ⎜⎝⎜ i=1 j1 ⎜ ⎜⎝ i1 ⎠ jk ⎛ ⎜ nj (7.19) Here, Xij is the attribute value of member i in cluster j. We next discuss an example where we apply the Ward’s approach to a sample of four members (A, B, C, and D). From the outset, each member forms its individual cluster. The attribute values of the four members are as follows. A: 4 B: 10 C: 5 D: 20 We first compute the Ward’s ESS for every possible cluster that can be formed with two members in the sample. ESS of members A and B = 4 2 10 2 12 ( 4 10)2 18 ESS of members A and C 4 2 52 12 ( 4 5)2 0.5 ESS of members A and D 4 2 20 2 12 ( 4 20)2 128 ESS of members B and C 10 2 52 12 (10 5)2 12.5 ESS of members B and D 10 2 20 2 12 (10 20)2 50 ESS of members C and D 52 20 2 12 (5 20)2 112.5 Members A and C form a cluster since the Ward’s ESS of their cluster is the lowest. Next we compute the Ward’s ESS for any possible cluster that consists of the (AC) cluster and each of the remaining members. ESS of the cluster that consists of members A, C, and B is 4 2 10 2 52 31 ( 4 10 5)2 20.67 ESS of the cluster that consists of members A, C, and D is 4 2 52 20 2 31 ( 4 5 20)2 160.67 The Ward’s ESS for the cluster of members A, C, and B is smaller than that of the clusters with members A, C, and D, and B and D. Therefore, Introduction to Data Mining members A, B and C form a cluster, and member D remains by itself in one cluster. Hierarchical divisive methods: AID, CHAID, and CART Hierarchical divisive methods start out by splitting a group of members into two or more subgroups and proceeds in the same splitting approach based on predetermined statistical criteria. The most common divisive methods are decision tree approaches such as Automatic Interaction Detection (AID), Chi-Square Automatic Interaction Detection (CHAID), and Classification and Regression Tree (CART). A decision tree approach starts out at the root of a tree where all members reside and splits the members into different subgroups (called branches or nodes). A tree is built in such a way that the variance of the dependent variable is maximized between groups and is minimized within groups. For instance, a group of consumers may be split into different age (independent variable) groups to maximize the variance of the household income (dependent variable) between the age groups. Figure 7-5 illustrates how a small decision tree looks like. There are two splits in the tree. The nodes that stop splitting are called terminal nodes. Nodes one, three, four, and five are terminal nodes. AID is a divisive approach that splits a group of members into binary branches. While in this approach the dependent variable needs to be metric, the independent variables can be either nonmetric or metric. Root (Node 0) Split 1 Node 1 Node 2 Node 3 Split 2 Node 4 Figure 7-5 Decision tree output illustration. Node 5 157 158 Data Mining and Market Intelligence Chi-Square Automatic Interaction Detection (CHAID) is more flexible than AID in that CHAID allows a group of members to be split into two or more branches. Given its flexibility, CHAID is more widely used in data mining than AID. The following are the key characteristics of CHAID. ● ● ● The dependent variable is usually nonmetric. The independent variables are either nonmetric or metric data with no natural zero value and no specific distribution constraints, but their possible number of groups should be no more than 15. (Struhl 1992) The chi square statistic is used to determine whether to further split a node. We next discuss an example where CHAID is used to better understand the member ’s attributes that are highly associated with the responsiveness to a direct mail promotion. The dependent variable, a binary variable with a value of ‘Yes’ or ‘No’, denotes the existence or absence of a response to the promotion. Assume there are 10,000 individuals in the marketing sample. Among them, 6000 individuals respond and 4000 do not respond to the promotion. Therefore, the overall all response rate to the direct mail promotion is 60%. Assume there are three responder attributes (independent variables) of interest, age, income, and gender. The CHAID approach assesses all the values of age, income, and gender to create splits with significant chi square statistics. Figure 7-6 illustrates the output of CHAID, where there are four terminal codes (nodes one, three, four, and five) in the tree. Age and gender are the drivers of responsiveness. Female subjects aged between 25 and 45 are the most responsive groups, followed by male subjects in the same age group. Subjects aged over 45, regardless of gender, are the least responsive individuals. We next review another example of a CHAID application to marketing. Assume a marketing manager at a gourmet cookware store wishes to find out which types of customers are more likely to purchase a high-end BBQ grill. The store has collected three pieces of demographic information about its customers: household income, marital status of the household head, and the number of children in the household. Assume the store has five years of detailed transaction history, and there are a total of 25,000 customers in the store database. The dependent variable, existence of a past purchase of a high-end grill, is a nominal variable with two possible values: purchase or no purchase. The three independent variables are household income, marital status of the household head, and the number of children in the household. CHAID is used to split the 25,000 customers into subgroups based on household income, the marital status of the household head, and the number of children. Introduction to Data Mining Node 0 Total: 10,000 Responders: 6000 Non-responders: 4000 Response rate: 60% Age [25–45] <25 Node 1 Total: 1300 Responders: 800 Non-responders: 500 Response rate: 62% Node 2 Total: 6200 Responders: 4200 Non-responders: 2000 Response rate: 68% >45 Node 3 Total: 2500 Responders: 1000 Non-responders: 1500 Response rate: 40% Gender Female Node 4 Total: 4000 Responders: 3000 Non-responders: 1000 Response rate: 75% Male Node 5 Total: 2200 Responders: 1200 Non-responders: 1000 Response rate: 55% Figure 7-6 CHAID analysis applied to direct mail promotion responses. Once the marketing manager understands which customer attributes are highly associated with the purchase of a BBQ grill, he can contact a list broker to rent lists of prospects with similar attributes. These prospects will form an ideal target audience for a BBQ grill marketing campaign. Consider yet another example of a CHAID application to marketing. Assume a high tech company specializing in enterprise networking security plans to launch an e-mail promotion. The promotion with a gift card offer will target a group of technology magazine subscribers. The purpose of the campaign is to acquire new customers who may currently be using a competitor ’s product and are considering additional purchases of similar products. Assume that there are over 50,000 subscribers to the technology magazine. The gift card offer has a value of $50 per subscriber so the offer cost will amount to $2.5 million if all the subscribers are targeted. In other words, in this particular situation it would be very costly to target all of the subscribers. The marketing manager at the high tech firm decides to segment the magazine subscriber list based on six attributes. 159 160 Data Mining and Market Intelligence ● ● ● ● ● ● The industry that a subscriber works for. The size of the company that a subscriber works for: For instance, the company size is 5000 if a subscriber works for a company with 5000 employees. Subscriber ’s office location type such as branch office, headquarters, and single location. Subscriber ’s role in network security purchase decision making such as authorizing and influencing. Subscriber job function such as information system, marketing and accounting. Subscriber ’s job title such as CTO, IT manager, and marketing VP. The marketing manager then uses the CHAID approach to analyze his internal database and to construct a profile of past security product buyers. The buyer profile indicates that networking security product buyers tend to work for large companies (company size of 1000 or more) in the banking industry. These buyers also tend to be IT managers and IT directors working at branch offices. Based on the above profile, the marketing manager next instructs the magazine publisher to select a targeted list of subscribers that are IT managers or IT directors who work at the branch offices of large banks. Assume the magazine company has a total of 2000 subscribers that meet the selection criteria. The total gift card offer cost is 2000 multiplied by $50, which amounts to $100,000. This cost is within the company’s program budget. Classification and Regression Tree (CART) is another decision tree approach. In this approach, both the dependent and the independent variable can be either metric or nonmetric (Struhl 1992). Like CHAID, CART is also widely used in data mining for marketing. Although the original CART algorithm allowed two-node splits only, there are now CART software implementations with revised algorithms that offer the flexibility of creating splits into more than two nodes. The following is a list of key characteristics of CART (Breiman, Friedman, Olshen, and Stone 1998). ● ● ● The dependent variable is metric or nonmetric with no specific distribution constraints. The independent variables can be either nonmetric or metric with no specific distribution constraints. In cases where the dependent variable and independent variable are both metric, the relationship between them can be either linear or nonlinear. When the dependent variable is metric, accuracy statistic measures such as average least squares can be used to determine whether a node continues to split. Introduction to Data Mining 1 n n ∑ (Yi Yˆ i )2 (7.20) i1 Yi is observed value of dependent variable Y of member i, Ŷi is the predicted value of variable Y and n is the number of members in the node. A node is split to minimize this particular statistic. When the dependent variable is nonmetric, the misclassification rate (the percentage of cases being misclassified) is used to determine whether a node should split further. The optimal tree is the one that minimizes the overall misclassification rate of all nodes in the tree (Breiman, Friedman, Olshen, and Stone 1998). In the next example, CART is used to better understand the drivers for customer purchases of a product. Assume three independent variables are analyzed: age, income, and gender. The dependent variable is the average annual purchase amount of a customer. Age and income are metric variables and gender is a nonmetric variable. There are a total of 5000 customers and the average purchase amount of these customers is $300. Figure 7-7 shows the final output of CART. There are three terminal codes (nodes two, three, and four) in the tree, and the 5000 customers are Node 0 Total: 5000 Average Purchase: $300 $50,000 Income $50,000 Node 4 Total: 1300 Average Purchase: $585 Node 1 Total: 3700 Average Purchase: $200 Age 30 Node 2 Total: 3000 Average Purchase: $100 Figure 7-7 CART output illustration. 30 Node 3 Total: 700 Average Purchase: $628 161 162 Data Mining and Market Intelligence assigned into these three nodes. The customers in node one on average have a purchase amount of $200. The customers in node two on average have a purchase amount of $100 and those in node three on average have a purchase amount of $628 and those in node four on average have a purchase amount of $585. CART has segmented the customers based on their purchase volume. Partitioning methods Partitioning methods assume that the initial number of clusters is predetermined. Unlike the hierarchical methods we discussed earlier, partitioning techniques allow for the reassignment of members from one cluster to another. One of the best-known partitioning techniques is the K-Means clustering method (Dillon and Goldstein 1984), which starts out by grouping the members into K clusters. There are numerous ways of creating these initial K clusters. Members with close proximity are grouped into the same clusters, and then are moved from one cluster to another to minimize the error of partition. If Xi,j,l is value of the jth attribute of member i in cluster l, X j , l is the mean of jth attribute in cluster l, p is the number of attributes, and n is the total number of members, the partition error is defined by in jp E ∑ ∑ ( X i , j , l X j , l )2 (7.21) i1 j1 We next discuss an example on how to create clusters based on the K-means approach. Assume there are six students (A, B, C, D, E, and F) with scores in three subjects: English, math, and music (as indicated in Table 7-1). The value of K is set to three. The Euclidean distances of scores between the students are shown in Figure 7-8. Table 7-1 K-means clustering example – student score raw data Student English score Math score Music score A B C D E F 60 100 55 98 70 98 90 85 70 65 80 100 78 90 40 95 44 78 Introduction to Data Mining A B C D E F A 0 42 43 49 37 39 B 42 0 69 21 55 19 C 43 69 0 70 18 65 D 49 21 70 0 60 39 E 37 55 18 60 0 48 F 39 19 65 39 48 0 Figure 7-8 Student score distance matrix. Three initial clusters are formed based on the distance matrix in Figure 7-9: A, (BDF), and (CE). We next compute the mean scores in English, math, and music by cluster. These mean scores, shown in Table 7-2, are used to derive the error of partition of the clusters. The error of partition of the initial three clusters, as defined by Eq. 7.21, is E (100 98.7)2 (98 98.7)2 (98 98.7)2 (85 83.3)2 (65 83.3)2 (100 83.3)2 (90 87.7)2 (95 87.7)2 (78 87.7)2 (55 62.5)2 (70 62.5)2 (70 75)2 (80 75)2 (40 42)2 (44 42)2 942.51 New clusters are formed if E decreases as a result of moving one student from one cluster to another. The final cluster configuration is the one with the lowest E. Principal component analysis Principal component analysis is a data reduction technique that can reduce the number of variables under analysis. The technique creates new variables called principal components that are linear combinations of the original variables. Principal components are uncorrelated to one another. We assume that there are m principal components (PC) derived from p original variables. Each principal component can be expressed as a linear combination of the original variables. 163 164 Data Mining and Market Intelligence A B C D E Frequency table 1 2 3 4 A B C D E 1 2 3 4 A B C D E 1 2 3 4 Row profiles Column profiles 1 D A B 2 C 4 3 E 1 D A B 2 C 4 3 E Figure 7-9 Correspondence analysis process. Table 7-2 Mean scores by cluster Student Mean English score Mean Math score Mean Music score A B, D, and F C and E 60 98.7 62.5 90 83.3 75 78 87.7 42 Introduction to Data Mining PC1 w11 X1 w12 X 2 w13 X 3 w1p X p PC2 w21 X1 w22 X 2 w23 X 3 w2 p X p PCm wm1 X1 wm 2 X 2 wm 3 X 3 wmp X p (7.22) where PCi is the ith principal component, wij is the coefficient of the jth original variable in the ith principal component, and Xj is the jth original variable. It is required that the sum of the squares of the coefficients in each principal component is one (Brooks 2002). For the ith principal component, this translates into the constraint wi12 wi 2 2 wi 3 2 wip 2 1 (7.23) It is also required that the coefficient vectors of the principal components must be orthogonal, namely, wi′w j 0 (7.24) where wi [wi1 , wi 2 , … , wip ], w j [w j1 , w j 2 , … , w jp ], and i ≠ j. If is the variance–covariance matrix of the original variables X and i is the ith eigenvalue of , the following condition must be satisfied for a nontrivial solution to exist. det ( ∑ i I ) 0 (7.25) The corresponding eigenvector of i is the factor loading vector wi. Given that is a symmetric matrix, the resulting eigenvectors are orthogonal to one another. The length of the eigenvectors can be scaled to unit length, as given by Eq. 7.23. The fraction of the total variance in the original variables that is explained by the ith principal component is given by ∑ i ip i=1 i (7.26) Factor analysis Factor analysis is also a data reduction technique that uncovers the underlying factors, fewer in number than the original variables that are common to the original variables. The original variables are linear combinations of these common factors. Notice the contrast with principal component 165 166 Data Mining and Market Intelligence analysis, where the principal components are linear combinations of the original variables. With wij (also called factor loading) the coefficient of the jth common factor in the ith original variable, fj the jth common factor, and error terms i, we can express the original variables, Xi, in terms of common factors as follows. X1 w11 f1 w12 f 2 w13 f 3 w1k f k 1 X 2 w21 f1 w22 f 2 w23 f 3 w2 k f k 2 X m wm1 f1 wm 2 f 2 wm 3 f 3 wmk f k m (7.27) In matrix notation, Eq. 7.27 can be expressed as X wf (7.28) With X a m 1 matrix that contains the original variables, w a m k matrix that contains all the coefficients of k common factors in the m original variables, f a k 1 matrix that contains the k common factors, and an m 1 matrix with error terms. It is assumed that the error terms are uncorrelated with each other and with the common factors, namely E( ) 0 and E(f ) 0 With the covariance matrix of the common factors, and the covariance matrix of the error terms, the variance of the original variables is given by the expression. var(X ) wff ′w ′ var() wΘw ′ (7.29) Assuming that the common factors have a variance of one and that they are not correlated with each other, then I (identity matrix) and Eq. 7.29 becomes (Dillon and Goldstein 1984) m k var(X ) ww ′ ∑ ∑ w ij 2 (7.30) i1 j1 The fraction of the variance of the original variables explained by the common factors is given by im jk ∑ i1 ∑ j1 wij 2 (7.31) var(X ) Discriminant analysis Discriminant analysis is a technique that examines the differences between two or more groups of members with respect to multiple members attributes Introduction to Data Mining (Klecka 1980). The dependent variable is a variable that indicates the group a member belongs to. An example of a dependent variable is a variable that has three possible values: ‘high-achiever group’, ‘medium-achiever group’, and ‘low-achiever group’. The independent variables are attributes associated with the members. Discriminant analysis can be used to predict which group a particular member with given attributes belongs to. To study the characteristics of each group using discriminant analysis, we start out by creating discriminant functions. Discriminant functions are linear combinations of the independent variables (member attributes), defined as follows. D1 b11 X1 b12 X 2 b13 X 3 b1p X p D2 b21 X1 b22 X 2 b23 X 3 b2 p X p Dm bm1 X1 bm 2 X 2 bm 3 X 3 bmp X p (7.32) where Di is the ith discriminant function, bip is the discriminant coefficient of the pth indepentent variable in the ith discriminant function, and Xp is the pth independent variable. A discriminant function is created to maximize the ratio of its betweengroup variance and the within-group variance. The value of a discriminant function is referred to as a discriminant score. Only those independent variables that are predictive of the dependent variable are included in the discriminant functions. Such variables are called discriminating variables. Statistics, such as Wilks’ lambda, the chi-square, and the F statistic are used to determine which independent variables are predictive (Klecka 1990). Wilks’ lambda is also used to assess the statistical significance of discriminant functions. The following is a list of key assumptions of discriminant analysis. ● ● ● ● Assumption 1: The dependent variable (group identity variable) is nonmetric. Assumption 2: The discriminating variables have a multivariate normal distribution. Assumption 3: The variance–covariance matrix of the discriminating variables is the same across groups. Assumption 4: Each member can belong to one and only one group. There are similarities between linear regression, CHAID, and discriminant analysis in that all of these three techniques are used to explore the dependence between a set of dependent and independent variables. However, there are basic differences between these three techniques in terms of assumptions (Dillon and Goldstein 1984). In a linear regression, the dependent variable is assumed to be a normally distributed random 167 168 Data Mining and Market Intelligence variable and the independent variables are assumed to be fixed. In discriminant analysis, it is assumed that the dependent variable is fixed and the discriminating variables are normally distributed. In CHAID, there are no distributional assumptions about the dependent or independent variables. A further difference between CHAID and discriminant analysis is that discriminant analysis constructs discriminant functions as linear combinations of the discriminating variables (independent variables), while CHAID does not assume any such linear relationship. What CHAID and discriminant analysis have in common is that both minimize misclassification by maximizing the ratio of the variance between groups and the variance within groups. Correspondence analysis Correspondence analysis, also called dual scaling, is used to analyze the association between two or more categorical variables and to visually represent this association in a low dimensionality diagram, called a perceptual map. This method is particularly useful for analyzing large contingency tables. In correspondence analysis it is assumed that the variables under analysis need to be categorical variables. In his book, Applied Correspondent Analysis, Clausen proposes a stepby-step process for conducting a correspondence analysis, illustrated in Figure 7-9. The steps in the process of a correspondence analysis with two categorical variables are as follows. ● ● ● Step one is the creation of a frequency table with the two categorical variables: Assume that the X variable has k possible values and the Y variable has l possible values. In this k 1 frequency table, entry nij is the number of members whose X variable value equals i and whose Y variable value equals j. The number of members whose X variable equals i is ni and whose Y variable equals j is nj. ni and nj are called the row total and column total, respectively. Step two is to set up a row profile table and a column profile table. The frequency table can be transformed to a row profile table by dividing each entry nij by the row total ni. The frequency table can be transformed into a column profile table by dividing each entry nij by the column total nj. Step three is to generate two key underlying dimensions for variables X and Y and to plot both variables on a two-dimensional map. The dimensions are selected based on the proportion of the variance of the original variables that these dimensions explain. The higher the proportion is, the more significant the dimension is. A detailed discussion (Clausen 1998) on the mathematical derivation of the dimensions is beyond the scope of this book. Introduction to Data Mining We next discuss a correspondence analysis example. Assume a firm conducts an online survey to measure the satisfaction level of visitors to the firm’s website. The total number of visitors is 4492. The visitors are classified into three types based on their familiarity of the site: first time visitors, frequent visitors, and infrequent visitors. The visitors are asked which of the following three features of the website is the most important to them: content, navigation, and presentation. Correspondence analysis will be used to analyze the survey response data and SAS will be used as the data-mining tool. In step one of the analysis, a contingency table (frequency table) is created based on visitor types and visitors’ selection of the most important website feature (content, navigation, or presentation) as illustrated in Table 7-3. Next the row profiles (row percentages) and the column profiles (column percentages) are created. Tables 7-4 and 7-5 show the row profile and the column profile respectively. The column profiles show that the infrequent visitors are likely to rate navigation as an important driving factor of their satisfaction of the site. Frequent visitors tend to identify content as the driving factor of their satisfaction with the site. Table 7-3 Contingency table of visitor type and importance of the three website features Visitor type New visitor Infrequent visitor Frequent visitor Total Most important web site feature Content Navigation Presentation Total 230 799 2400 3429 100 450 100 650 100 113 200 413 430 1362 2700 4492 Table 7-4 Web visitors row profiles Visitor type New visitor Infrequent visitor Frequent visitor Row profile Content Navigation Presentation Total 0.54 0.59 0.89 0.23 0.33 0.04 0.23 0.08 0.07 1 1 1 169 170 Data Mining and Market Intelligence Table 7-5 Web visitors column profiles Column profile Visitor type Content Navigation Presentation New visitor Infrequent visitor Frequent visitor Total 0.07 0.23 0.70 1 0.15 0.70 0.15 1 0.25 0.27 0.48 1 In step two of the analysis, two key dimensions are created. Figure 7-10 shows the SAS output of the analysis. The first dimension explains 87.89% of the variance of the data, while the second explains 12.11% of the variance of the data. In the third step of the analysis a two-dimensional correspondence map is created. Table 7-6 and Table 7-7 show the row and column coordinates of the two new dimensions. The first dimension shows a large weight on navigation, an indication of high association between this dimension and the navigation feature. Therefore, we may label this dimension as ‘site navigation’. The second dimension has a large weight on site presentation, an indication of high Inertia and chi-square decomposition Dimension Singular value Principal Chiinertia square percent Cumulative percent 18 36 54 72 90 ----+----+----+----+----+--Dimension 1 0.39754 0.15804 709.904 87.89 Dimension 2 0.14755 0.02177 Total 87.89 **************** 97.797 12.11 100.00 *** 0.17981 807.701 100.00 Degrees of freedom 4 Figure 7-10 Correspondence analysis key dimensions and their explanatory power. Introduction to Data Mining Table 7-6 Row coordinates of the two dimensions in correspondence analysis Visitor type Dimension 1 Dimension 2 New visitor Infrequent visitor Frequent visitor 0.3900 0.5165 0.3277 0.4298 0.1152 0.0103 Table 7-7 Column coordinates of the two dimensions in correspondence analysis Web site feature Dimension 1 Dimension 2 Content Navigation Presentation 0.1995 0.9256 0.2000 0.0356 0.1033 0.4577 association between this dimension and the site presentation feature. Therefore, we may label this dimension as ‘site presentation’. Figure 7-11 shows the SAS output of the correspondence analysis. The corresponding code is shown in Figure 7-12. Dimension 2 (12.11%) 0.50 Presentation New visitors 0.25 0.00 0.25 Frequent visitors Content 0.75 0.50 0.25 Navigation infrequent visitors 0.00 0.25 Dimension 1 (87.89%) Figure 7-11 Correspondence analysis map. 0.50 0.75 1.00 171 172 Data Mining and Market Intelligence *--- Create Input Data---; data sasuser. corres_ex; length visitor $30; input visitor $ content navigation preso; cards; ‘New_Visitors’ 230 100 100 ‘Infrequent_Visitors’ 799 450 113 ‘Frequent_Visitors’ 2400 100 200 run; *---Perform Simple Correspondence Analysis---; proc corresp all data=sasuser.corres_ex outc=outcorres; var content navigation preso; id visitor; run; *---Plot the simple correspondence analysis results---; %plotit(data=outcorres, datatype=corresp); Figure 7-12 SAS code for generating correspondence map. Analysis of variance Analysis of Variance (ANOVA) is a statistical technique used to quantify the dependence relationship between dependent and independent variables. The technique is often used in experimental design where we wish to assess the impact of stimuli (treatments or independent variables) on one or more dependent variables. Although widely used for assessing experimental results in the pharmaceutical and social sciences, ANOVA has recently gained traction in marketing, especially when applied to the understanding of marketing stimuli on audience responses. To be consistent with the application of ANOVA to experimental design, we will use the treatment and block concepts introduced in Chapter 6. Treatments Introduction to Data Mining are the independent (or predictive) variables in an experimental design, calibrated to observe potential causality. Blocks are groups of similar units of which roughly equal numbers of units are assigned to each treatment group. Units are objects, individuals, or subjects that are either subjected, or not subjected to a treatment in an experiment. One-way ANOVA is used to analyze the treatment effects on subjects in an experiment. Two-way ANOVA is used to analyze both the treatment and the block (sometimes referred as replication) effects on subjects (Snedecor and Cochran 1989). One-way ANOVA assumes that any changes in subject behavior or characteristics are the result from treatment. This relationship between the subject behavior (dependent variable) and the treatment effects is expressed as: Xij j ij (7.33) where Xij is the value of the dependent variable for subject i under treatment j, is the mean of the dependent variable across all subjects, j is the difference between and j, the mean of the dependent variable for the subjects under treatment j, and ij is the error term of subject i under treatment j. The error term represents the portion of subject behavior change that is due to random effect. In a one-way ANOVA analysis, the F statistic is used to assess the statistical significance of the treatment effects. The F statistic is the ratio of the mean treatment sum of squares (definition to follow) and the mean error sum of squares (definition to follow) with (a 1, n a) degrees of freedom given a treatments and n subjects. The mean treatment sum of squares is defined as a 1 n j ( j )2 ∑ a 1 j1 (7.34) where nj is the number of subjects under treatment j, and j is the mean of the dependent variable of the subjects under treatment j. The mean error sum of squares is 1 na a in j ∑ ∑ (Xij μ j )2 j1 i1 (7.35) Hence, the F statistic is ja ∑ j1 n j ( j )2 /(a 1) a in ∑ j1 ∑ i1 (Xij j )2 /(n a) j (7.36) 173 174 Data Mining and Market Intelligence If the F statistic for the treatment effects is greater than the critical F value at a specified significance level, then there is a difference in the mean of the dependent variable across the different treatments groups and the treatment effects are statistically significant. Two-way ANOVA assumes that changes in subject behavior or characteristic are due to the treatment effects and the block effects. This relationship between the subject behavior, the treatment effects, and the block effects is expressed as Xijk j k ijk (7.37) where Xijk is the value of the dependent variable of subject i in block k under treatment j, is the mean of the dependent variable across all subjects, j is the difference between and j, the mean of the dependent variable for the subjects under treatment j, k is the difference between and k, the mean of the dependent variable for the subjects in block k, and ijk is the error term of subject i in block k under treatment j. For a two-way ANOVA analysis, the F statistic is used to assess the statistical significance of the treatment effects and the block effects. The F statistic for assessing the treatment effects is the ratio of the mean treatment sum of squares (Eq. 7.34) and the mean error sums of squares with (a 1, ab a b 1) degrees of freedom given a treatments and b blocks. The F statistic for assessing the statistical significance of the treatment effects is ab a b 1 ja ∑ j1 n j ( j )2 1 a Fa1, abab1 in ,ja ,kb kb ja ∑ i1,j j1, k1 (Xijk )2 ∑ k1 mk ( k )2 ∑ j1 ni ( j )2 (7.38) where Xijk is the value of the dependent variable for subject i in block k under treatment j, is the mean of the dependent variable across all subjects, j is the mean of the dependent variable for the subjects under treatment j, k is the mean of the dependent variable for the subjects in block k, nj is the number of subjects under treatment j, and mk is the number of subjects in block k. If the F statistic for the treatment effects is greater than the critical F value at a specified significance level, then there is a difference in the mean of the dependent variable between treatments and the treatment effects are statistically significant. The F statistic for assessing the block effects is the ratio of mean block sum of squares and the mean error sums of squares with Introduction to Data Mining (b 1, ab a b 1) degrees of freedom given a treatments and b blocks. The mean block sum of squares is b 1 mk ( k )2 ∑ b 1 k1 (7.39) The F statistic for assessing the statistical significance of the block effects is the ratio of mean block sum of squares and mean error sums of squares with (b 1, ab a b 1) degrees of freedom (Mead 1989). ab a b 1 kb ∑ k1 mk ( j )2 1 b Fb1, abab1 in , ja , kb kb ja ∑ i1,j j1, k1 (Xijk )2 ∑ k1 mk ( k )2 ∑ j1 n j ( j )2 (7.40) If the F statistic for the block effects is greater than the critical F value at a specified significance level, then there is a difference in the mean of the dependent variable between blocks. In other words, the block effects are statistically significant. Canonical correlation analysis Canonical correlation analysis is used to analyze correlation between two groups of metric variables, where each set consists of one or more variables. Simple and multiple linear regression are particular cases of canonical analysis. Simple linear regression has one variable in both sets, while multiple linear regression has one variable in one set of variables, and multiple variables in the other set. Often, one set of variables is interpreted as dependent variables and the other as independent variables, as in the case of linear regression (Dillon and Goldstein 1984). Given a set of X variables and a set of Y variables, a canonical analysis finds X*, a linear combination of the X variables, and Y*, a linear combination of the Y variables such that X* and Y* are highly correlated. X* and Y* are called canonical variates. The analysis often results in multiple sets of X* and Y*. The coefficients in the linear combinations are called canonical weights. With this, X* and Y* can be written as: X * a1 X1 a2 X 2 am X m (7.41) Y * b1Y1 b2 Y2 bp Yp (7.42) where a1, a2, …, am are the canonical weights for canonical variate X*, and b1, b2, …, bp are the canonical coefficients for the canonical variate Y*. 175 176 Data Mining and Market Intelligence The set of X* and Y* with the highest correlation among all the possible set of canonical variates is called the first set of canonical variates. Canonical variates are normalized to have unit variance (Dillon and Goldstein 1984). We next discuss an example of canonical correlation analysis. Assume company ABC conducts an online survey to measure how the demographics of its website visitors correlate with their satisfaction about the firm’s website. The respondents are asked to rate their satisfaction in the following five areas. ● ● ● ● ● Satisfaction with ABC as a company Overall satisfaction with the website Satisfaction with the content of the website Satisfaction with the ease of navigation of the website Satisfaction with the presentation of the website. The respondents are also asked to answer the following set of questions about their own characteristics (web activities and their demographics). ● ● ● ● Frequency of their visit to the website of company ABC The number of personal computers in the respondent’s company The number of employees in the respondent’s company Their need for technology consulting services. The canonical correlation analysis technique can be applied to determine how the respondent satisfaction about the website correlates with the respondent characteristics (demographics and web activities). Figure 7-13 shows the SAS output of the canonical correlation coefficients. The four largest canonical correlation coefficients are 0.39, 0.19, 0.07, and 0.007. The canonical analysis creates four satisfaction canonical variates and four characteristics canonical variates. Figure 7-14 shows the eigenvalues associated with each of the four canonical coefficients. Table 7-8 shows how the five original satisfaction variables are correlated with the four satisfaction canonical variates. Table 7-9 shows the Pearson product-moment correlation coefficients between the four original respondent characteristic variables and the four characteristic canonical variates. Multi-dimensional scaling analysis The technique is used to construct a low-dimensional (two-dimensional, for example) map that best describes the relative positions and proximity of members in a multi-dimensional space. Multi-dimensional scaling (MDS) can be applied to either metric or nonmetric data. MDS applied to metric and nonmetric data are called metric MDS and nonmetric MDS, respectively. MDS is similar to factor analysis and principal component analysis in that all of them are data reduction techniques. Here, we will focus on metric MDS. Introduction to Data Mining The CANCORR Procedure Canonical correlation analysis Canonical correlation Adjusted Canonical correlation Approximate standard error Squared correlation 1 0.391113 0.389369 0.012468 0.152969 2 0.190260 0.187646 0.014187 0.036199 3 0.071957 . 0.014644 0.005178 4 0.007380 . 0.014719 0.000054 Canonical Figure 7-13 Top four canonical correlation coefficients. Test of H0: The canonical correlations in the current row and all that follow are zero Eigenvalues of Inv (E)*H = CanRsq/(1-CanRsq) Likelihood Approximate Eigenvalve Difference Proportion Cumulative ratio F Value Num df Den df Pr > F 1 2 3 4 0.1806 0.0376 0.0052 0.0001 0.1430 0.8083 0.8083 0.81209783 0.0324 0.1681 0.9765 0.95875856 0.0052 0.0233 0.9998 0.99476798 0.0002 1.0000 0.99994554 49.48 16.30 4.03 0.13 20 15281 <.0001 12 12192 <.0001 6 9218 0.0005 2 4610 0.8820 Figure 7-14 Canonical correlation coefficients and their corresponding eigenvalues. Metric MDS, often referred as classical scaling, uses the distance between members to measure their similarity in certain attributes. There are three steps in performing a metric MDS analysis (Dillon and Goldstein 1984). 177 178 Data Mining and Market Intelligence Table 7-8 Correlation between the original satisfaction variables and their canonical variates Satisfaction Brand satisfaction Website satisfaction Content satisfaction Navigation satisfaction Presentation satisfaction Canonical variate 1 Canonical variate 2 Canonical variate 3 Canonical variate 4 0.8785 0.2004 0.2322 0.2457 0.8686 0.0247 0.0040 0.4814 0.2835 0.8539 0.1675 0.3560 0.7684 0.1172 0.1245 0.0672 0.6775 0.2655 0.6796 0.0590 Table 7-9 Correlation between the original company characteristic vari- ables and their canonical variates Characteristics Consulting need Visit frequency Company size Number of PCs Canonical variate 1 Canonical variate 2 Canonical variate 3 Canonical variate 4 0.8568 0.4301 0.4031 0.4025 0.0949 0.8617 0.4835 0.4806 0.5022 0.2637 0.7058 0.7448 0.0686 0.0546 0.3251 0.2285 The first step of a metric MDS analysis is to create a distance matrix D for a group of members with p attributes. The entries in D, denoted by dij, are the Euclidean distances between the ith and jth members. If Xik is the value of the kth attribute of member i, and Xjk is the value of the kth attribute of member j, the Euclidian distances are given by ⎪⎧ kp ⎪⎫ dij ⎪⎨ ∑ (Xik X jk )2 ⎪⎬ (7.43) ⎪⎪ k1 ⎪⎪ ⎩ ⎭ Matrix D is symmetric with zero diagonal elements and non-negative off-diagonal elements. Introduction to Data Mining The second step in metric MDS analysis is to transform the distance matrix D into another symmetric matrix B, whose entries are defined as follows 1 bij (dij 2 di.2 d. j 2 d..2 ) 2 where di.2 1 n 1 (7.44) 1 ∑ j dij 2 , d. j 2 n ∑ i dij 2 , and d..2 n2 ∑ j ∑ i dij 2 The third step in the analysis is to compute the two largest eigenvalues of matrix B and their associated eigenvectors. The elements of the eigenvectors are the coordinates of the members in the new lower-dimensional space. Time series analysis Time series analysis is used to analyze data collected at uniformly distributed time intervals. An example of time series data is the trade volume of a public traded company at the beginning of the month (Figure 7-15). The objective of time series analysis is twofold: ● To describe the pattern of time series data as a function of time and other variables. A pattern is a stable function that describes the general trend of a time series. 10,000 Trade volume (in thousands) 9000 8000 7000 6000 5000 4000 3000 2000 1000 Date Figure 7-15 Trade volume time series. 10/7/2005 7/7/2005 4/7/2005 1/7/2005 7/7/2004 10/7/2004 4/7/2004 1/7/2004 10/7/2003 7/7/2003 4/7/2003 1/7/2003 7/7/2002 10/7/2002 4/7/2002 1/7/2002 0 179 180 Data Mining and Market Intelligence ● To forecast the occurrence of future events based on historical data. Forecasting must be understood in terms of probabilities of occurrence given the historical information represented by the data. Autocorrelation is an important concept in time series analysis. The autocorrelation coefficient is the correlation coefficient of a time series variable at time t, and the same variable at time t k (a time lag of k), and can be expressed as rk (Miller and Wichern 1977). t = N −k rk = (1 / N )∑ t=1 (1 / N )∑ ( yt − y )( yt+k − y ) t= N t=1 ( yt − y )2 t = N −k = ∑ t=1 ∑ ( yt − y )( yt+k t= N ( y t − y )2 t=1 − y) (7.45) Time series analysis is usually conducted under conditions of stationarity. In a stationary time series, the variable under analysis has constant mean, variance, and autocorrelation. To facilitate the type of time series analysis we discuss here, nonstationary time series data needs to be transformed until it becomes stationary. The most common transformation is to difference the series by replacing of the value of a variable at time t with the difference between the value at time t and the value at time t 1. Sometimes, achieving stationarity may require multiple differencing iterations. Here we focus on autoregressive (AR) and moving average (MA) time series analysis models. In an autoregressive (AR) model, variable Y at time t is a function of variable Y at time t k, Yt k Ytk t k 1, 2,… (7.46) where is the intercept, k is an autoregressive parameter, and t is the error term at time t. Autoregressive models are based on five assumptions. ● ● ● ● ● Assumption 1: Expected value of the error term E(t) 0 Assumption 2: Constant variance of the error term var(t) 2 Assumption 3: Covariance between the error terms E(tj) 0 where ij Assumption 4: Covariance between the error term and time series value at time t k, E(tytk) 0 Assumption 5: 1 k 1 A first-order autoregressive model is a model where the value of variable Y at time t, Yt, is a linear function of value of variable Y at time t 1, Yt1. A first-order autoregressive model can be expressed as follows yt 1 yt1 t (7.47) Introduction to Data Mining where is an intercept, 1 is the first-order autoregressive parameter, and t is the error term at time t. A second-order autoregressive model is a model where Yt, is a linear function of Yt1 and Yt2. Yt 1Yt1 2 Yt2 t (7.48) where is an intercept, 1 is the first-order autoregressive parameter, 2 is the second-order autoregressive parameter, and t is the error term at time t. We can extend the autoregressive model to kth-order model ik Yt ∑ i Yti t i1 (7.49) where is an intercept, i is the ith-order autoregressive parameter, and t is the error term at time t. In a MA model, variable Y at time t is a function of the error terms at various points in time. y t et k etk k 1, 2,… (7.50) with a constant, ei an error term at time t, et–k the error term at time t k and k a MA parameter. MA models are based on four assumptions. ● ● ● ● Assumption 1: The expected value of the error term E(et) 0 Assumption 2: Constant variance of the error term var(et) 2 Assumption 3: Covariance between the error terms E(eiej) 0, where i j. Assumption 4: 1 k 1 A first-order MA model is a model where the value of variable Y at time t, Yt, is a function of the error term at time t, et, and the error term at time t 1, et1. y t et 1 et1 (7.51) where is a constant and 1 is a first-order MA parameter. In a second-order MA Yt, is a linear function of the error terms et, et1, and et2. Yt et 1 et1 2 et2 (7.52) where is a constant, 1 is a first-order MA parameter, and 2 is a second-order MA parameter. The autocorrelation function (ACF) and partial autocorrelation function (PACF) can be used to determine the appropriate type of model (AR 181 182 Data Mining and Market Intelligence or MA) to use. An ACF plot, referred to as a correlogram, is a chart that displays the autocorrelation coefficient (defined in Eq. 7.45), along with its confidence interval, against the number of time unit lags. The mutual influence of elements in a time series that are separated by more than one lag is affected by the correlation of intermediate lags. The PACF expresses the correlation coefficient between a variable and a particular lag when the correlation effects of intermediate lags are removed. To compute the PACF of a particular lag, the linear influence of intermediate lags must be removed. In the particular case of unit lag, partial autocorrelation and autocorrelation are equivalent. The following are some rules for using ACF and PACF plots as suggested by the SPSS software. ● ● ● If a time series is not stationary, its ACF plot will likely show that the autocorrelation coefficient remains high for half a dozen or more lags, rather than quickly declining to zero. If a time series has an autoregressive property, its ACF plot will be characterized by an exponentially declining autocorrelation. Its PACF plot will show spikes in the first few lags. The number of distinct spikes indicates the order of an autoregressive model. If a time series has a MA property, its correlogram will show spikes in the first few lags. Its PACF plot will be characterized by an exponentially declining pattern. The number of distinct spikes in the correlogram gives an indication of the order of the MA model needed to satisfactorily describe the time series. As an example of the application of ACF and PACF to time series analysis, consider the data in Figure 7-15, describing monthly stock trade volume of a financial company from January 2002 to December 2005. The ACF plot of this time series is illustrated in Figure 7-16. In this plot, the autocorrelation coefficient remains significant for half a dozen or more lags, rather than quickly declining to zero. The pattern indicates that the data is not stationary and will need to be differenced. The result of the first differencing is shown in Table 7-10 as the value of variable ‘Volume-1’ on February 1, 2002 is 619,340 is computed by subtracting 5,287,016, the trade volume on January 7, 2002, from 4,667,673, and the original trade volume on February 1, 2002. Using the differenced data, we produce new ACF and PACF plots, shown in Figures 7-17 and 7-18 to examine any potential autoregressive or MA properties in the data. The ACF plot in Figure 7-17 shows an overall decline in ACF, but there are still significant values for lag values of eight, twelve, and thirteen. The PACF plot in Figure 7-18 shows spikes for lag values of one, two, three, five, seven, and nine. We next apply one additional degree of differencing Introduction to Data Mining Volume 1.0 ACF 0.5 0.0 0.5 1.0 1 2 3 Coefficient 4 5 6 7 8 9 10 11 12 13 14 15 16 Lag number Upper confidence limit Lower confidence limit Figure 7-16 Stock trade volume time series ACF (SPSS output). to remove the spikes at lags five, seven, and nine. The resulting ACF and PACF are shown in Figure 7-19 and Figure 7-20, respectively. After two degrees of differencing, the spikes at lags one, two, and three of the PACF chart remain but those at lag five, seven, and nine are significantly diminished. At this stage, we can attempt to fit an autoregressive model with the data at two degrees of differencing. Some time series models possess both autoregressive and moving average properties after the underlying data has been differenced appropriately to achieve stationarity. These models are referred as autoregressive integrated moving average models, ARIMA (p, d, q). Parameters p, d, and q indicate the order of autocorrelation, the number differencing passes, and the moving average order, respectively. Standard ARIMA time series analysis is a three-step process: model specification, parameter estimation, and diagnostic checking (Miller and Wichern 1977). 183 Data Mining and Market Intelligence Table 7-10 Differencing the data – the first ten data points of the time series Date Close price Trade volume Volume_1 Volume_2 Volume_3 7-Jan-02 1-Feb-02 1-Mar-02 1-Apr-02 1-May-02 3-Jun-02 1-Jul-02 1-Aug-02 3-Sep-02 1-Oct-02 14.37 13.04 13.09 11.39 12.09 11.20 8.95 9.18 8.70 9.18 5,287,016.00 4,667,673.00 4,346,905.00 4,882,322.00 5,022,504.00 5,086,200.00 5,840,595.00 4,582,559.00 5,014,290.00 4,588,421.00 298,575.00 856,185.00 395,235.00 76,486.00 690,699.00 2,012,431.00 1,689,767.00 857,600.00 557,610.00 1,251,420.00 318,749.00 767,185.00 2,703,130.00 3,702,198.00 2,547,367.00 619,343.00 320,768.00 535,417.00 140,182.00 63,696.00 754,395.00 1,258,036.00 431,731.00 425,869.00 DIFF (Volume_1) 1.0 0.5 ACF 184 0.0 0.5 1.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Lag number Coefficient Upper confidence limit Figure 7-17 ACF with difference of one (SPSS output). Lower confidence limit Introduction to Data Mining DIFF (Volume_1) 1.0 Partial ACF 0.5 0.0 0.5 1.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Lag number Coefficient Upper confidence limit Lower confidence limit Figure 7-18 PACF with difference of one (SPSS output). ● ● ● Model specification: The first step in building an ARIMA model is to identify the appropriate values for model parameters p, d, and q. Plots on raw data, such as ACF and PACF can be used to uncover any underlying data pattern. Take as an example the time series data on stock trade volume in Figure 7-15. The time series becomes stationary and exhibits an autoregressive pattern of up to three lags after two degrees of differencing. Parameter estimation: During the parameter estimation phase, a function minimization algorithm is used to maximize the likelihood of the observed time series, given the parameter values. In regression, this requires minimization of the sum of squares of the errors. Diagnostic checking: In the diagnostic checking phase, the data is examined to uncover any violation of the key assumptions. The outcome of diagnostic checking may lead to modification of the model by changing the number of parameters. 185 Data Mining and Market Intelligence DIFF (Volume_2) 1.0 0.5 ACF 186 0.0 0.5 1.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Lag number Coefficient Upper confidence limit Lower confidence limit Figure 7-19 ACF with difference of two (SPSS output). Conjoint analysis Conjoint analysis is used to quantify the relationship between product utility and a product feature, such as price, warranty, color, and quality based on feedback from prospective customers. The level of product utility can be quantified by product ranking. Data used for conjoint analysis is usually acquired from survey studies where potential customers are asked to rate real or fictitious products with various combinations of features. The application of conjoint analysis enables product vendors to create the best combination of features to maximize the appeal of their products to potential buyers. Conjoint analysis is based on a combination of data transformation and linear regression. The dependent variable in a conjoint analysis is product ranking, which can be treated as either metric or nonmetric variable. Depending on how the dependent variable is treated, the analysis can be classified as metric conjoint analysis, where the dependent variable is treated as metric, and nonmetric conjoint analysis, where the dependent Introduction to Data Mining DIFF (Volume_2) 1.0 Partial ACF 0.5 0.0 0.5 1.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Lag number Coefficient Upper confidence limit Lower confidence limit Figure 7-20 PACF with difference of two (SPSS output). variable is regarded as nonmetric (Kuhfeld 2005). Here, we will focus on metric conjoint analysis. Metric conjoint analysis evaluates the relationship between feature utility (also known as part-worth utility) and product ranking. To illustrate this concept, consider the case of a product with two features, color and size. Assume that a car can be red, white, or blue. Here, color is one feature of interest, and red, white, and blue are this feature’s values. Assume size is another feature of interest, with small and large as its values. Assume that in a survey, two potential customers are shown cars with six combinations of colors and sizes and are asked to rate each car on a scale of 1 to 10. Table 7-11 shows the data collected from the survey. To conduct conjoint analysis, using the survey information exemplified in Table 7-11, we first derive a relationship between product rating and product features of the form yi , j colori size j i , j (7.53) 187 188 Data Mining and Market Intelligence Table 7-11 Data illustration for conjoint analysis (in practice, a large number of rating columns must be used to obtain statistically significant testimations) A large red car A large blue car A large white car A small red car A small blue car A small white car Rating given by potential customer 1 Rating given by potential customer 2 5 6 3 8 2 9 8 9 3 5 7 6 where yi,j is the rating when the color feature has a value i and the size feature has a value j. The value is the mean rating across colors and sizes. The parameters colori and sizej, which are computed using linear regressions, represent the deviations of ycolor, size from due to the color and size features (i stands for either red, blue, or white, and j for large or small), respectively. To estimate the values of these two parameters, we utilize linear regression after transforming the original product feature variables, which are categorical variables, into metric dummy variables, Xred, Xblue, Xwhite, Xlarge, and Xsmall. ycolor,size colorred Xred colorblue X blue colorwhite X white sizesmall Xsmall + sizelarge Xlarge + color,size (7.54) ANOVA is used to determine whether colori and sizej are statistically significant in impacting the product rating. Once colori and sizej are estimated and their statistical significance established, a decision can be made about which of the product features considered in the analysis must be incorporated into the product to maximize its appeal to customers. Logistic regression Logistic regression is used to estimate the probability of the occurrence of an event. In direct marketing, logistic regression is one of the most widely Introduction to Data Mining used techniques for building targeting models, such as response and conversion models. Figure 7-21 shows the relationship between the probability of the event occurring and value of the independent variable. The odd ratio of event Y, PY / PN , where PY and PN add up to one, is defined as the ratio of the probability of occurrence of Y (PY) and the 1.00 Probability of event occurring 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 0 5 10 15 20 25 30 Independent variable Figure 7-21 Probability of event occurring vs. value of independent variable. probability of no occurrence of Y (PN). In logistic regression, the natural logarithm of the odd ratio is assumed to be a linear function of the independent variables that are predictive of the outcome of event Y. A logistic regression model can be expressed as (Gujarati, 1988): ⎛P ln ⎜⎜⎜ Y ⎜⎝ PN ⎞⎟ ⎟⎟ 0 1 X1 2 X 2 k X k ⎟⎠ (7.55) where Xi’s are the independent variables and i’s are coefficients, whose statistical significance can be assessed with a chi-square statistic. To deal with nonmetric independent variables, we introduce dummy variables, which have a numeric value of zero or one depending on whether a particular state obtains or not. Take as an example, an income variable with three possible values, ‘high-income’, ‘medium-income’, and ‘low-income’. The variable needs to be transformed to three new binary variables named 189 190 Data Mining and Market Intelligence ‘high-income’, ‘medium-income’, and ‘low-income’. The value of each of these three variables for a member is one if the member is in the group and is zero otherwise. Based on Eq. 7.54, we can derive the probability of occurrence of event Y PY e 0 1X1 k Xk 1 e 0 1X1 k Xk k 1, 2, 3, … (7.56) Association analysis Association analysis addresses the question: What products do customers tend to purchase together? This tendency is expressed in terms of rules that indicate the likelihood of two or more products being purchased by the same person over a period of time. Symbolically, a rule indicating that a person who purchases product A tends to purchase product B is commonly expressed as A;B. Three standard measures are used to quantify the significance of such a rule: support, confidence, and lift. These measures are typically supported by data mining software, such as SAS Enterprise Miner, IBM Intelligent Miner, and Associate Engine. Confidence is the conditional probability that a customer will purchase product B given that he has purchased product A from a particular vendor. Lift measures how many times more likely is a customer who has purchased product A to purchase product B, when compared with those who are randomly selected from the sample. Support is the percentage of total transactions where A and B are purchased together from the vendor during a specified period of time. Combinations of products with high values for confidence, lift, and support are highly associated. There is no set criterion for a cut off threshold for any of these measures below which the level of association can be considered negligible. Once a vendor has identified a combination of products that are highly associated, the products can be marketed as a bundle for cross-sell purposes. Collaborative filtering Collaborative filtering aims to forecast individual preferences about product or services. To achieve this, collaborative filtering relies on the preferences of a group of people that are “similar” (in a sense we will describe shortly) to the individuals whose preferences are being studied. Collaborative filtering is a technique often applied in marketing for crosssell and up-sell analysis. To quantify similarity, we compute correlation between individuals based on their attributes, such as their ratings or purchases of particular Introduction to Data Mining products. Those who have the highest correlation with a particular individual are referred to as this individual’s nearest neighbors. Consider the following example: Five individuals (Susan, David, Anna, Linda, and Tom) are asked to rate six movies (‘Traffic’, ‘Cast Away’, ‘Chocolate’, ‘Mission Impossible’, ‘The Gift’, and ‘Quill’). Susan does not rate two of the six movies, ‘The Gift’ and ‘Quill’. A collaborative filtering analysis is conducted to predict which of these two movies is more appealing to Susan so that an appropriate movie recommendation can be offered to her. There are two questions that we need to address in this analysis. ● ● Who are Susan’s nearest neighbors? Which movie (‘The Gift’ or ‘Quills’) should we recommend to Susan? To identify Susan’s nearest neighbor, we first calculate the Pearson correlation coefficient between individuals (as we discussed in Chapter 6) based on the data in Table 7-12. Table 7-12 Movie ratings for collaborative filtering Individual name Susan David Anna Linda Tom Movie rating Traffic Cast Away Chocolate Mission Impossible The Gift Quills 5 4 3 1 2 1 2 1 2 3 4 4 4 5 4 3 3 3 1 2 N/A 1 2 2 2 N/A 2 5 4 3 The correlations turn out as follows Susan and David: 0.97 Susan and Anna: 0.81 Susan and Linda: 0.08 Susan and Tom: 0.15 David and Anna are Susan’s nearest neighbors since their correlation coefficients with Susan have the highest values. We next compute these who individuals’ average rating of the two movies that Susan has not rated (‘The Gift’ and ‘Quills’) The average of David’s and Anna’s ratings for ‘The Gift’ is 1.5, and for ‘Quills’ is 3.5. Therefore, ‘Quills’ should be recommended to Susan since its average rating from David and Anna is higher than that of ‘The Gift’. 191 192 Data Mining and Market Intelligence Collaborative filtering is frequently used in marketing for product recommendations that lead to potential cross sell and up sell opportunities. Amazon.com is an example of the use of this technique. ■ References Berry, M.J.A., and G. Linoff. Data Mining Techniques. John Wiley & Sons, New York, 1997. Box, G.E.P., G.M. Jenkins, and G.C. Reinsel. Time Series Analysis: Forecasting and Control, 3rd ed. Prentice Hall, Upper Saddle River, New Jersey, 1994. Breiman, L., J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Tree. Chapman & Hall/CRC, Boca Raton, FL, 1998. Brooks, C. Introductory Econometrics for Finance. Cambridge University Press, New York, 2002. S-E. Clausen. Applied Correspondence Analysis: An Introduction (Quantitative Applications in the Social Sciences). Sage Publications, Thousand Oaks, CA, 1998. Cooley, W.C., and P.R. Lohnes. Multivariate Data Analysis. John Wiley & Sons, New York, 1971. Dillon, W., and M. Goldstein. Multivariate Analysis – Methods and Applications. John Wiley & Sons, New York, 1984. Everitt, B.S., and S. Rabe-Hesketh. The Analysis of Proximity Data. Arnold, London, 1997. Greenacre, M.J. Practical Correspondence Analysis. In V. Barnett (ed.) Interpreting Multivariate Data (pp. 119–146). John Wiley & Sons, New York, 1981. Klecka, W.R. Discriminant Analysis (Quantitative Applications in the Social Sciences). Sage Publications, Thousand Oaks, CA, 1980. Kuhfeld, W.F. (2005). Conjoint Analysis SAS Technical Support Series TS-722H. SAS Institute, Cary, NC, 2005. Mead, R. The Design of Experiments – Statistical Principles for Practical Application. Cambridge University Press, New York, 1988. Miller, R., and D.W. Wichern. Intermediate Business Statistics: Analysis of Variance, Regression, and Time Series. The Dryden Press, HRW (Holt, Rinehart, and Winston, Inc.), Austin, TX, 1977. Neter, J., W. Wasserman, and M.H. Kutner. Applied Linear Statistical Models. Irwin, Homewood, IL, 1990. Rud, O.P. Data Mining Cookbook. John Wiley & Sons, New York, 2001. Snedecor, G.W., and W.G. Cochran. Statistical Methods, 8th ed. Iowa State University, Ames, Iowa, 1989. Struhl, S.M. Market Segmentation, An Introduction and Review. American Marketing Association, Chicago, IL, 1992. CHAPTER 8 Audience Segmentation This page intentionally left blank Audience segmentation is one of the most important marketing analytic topics and an enabler in matching the right products, marketing message, incentives, and creative with the right audiences. This chapter presents four case studies that demonstrate the application of data mining techniques to audience segmentation analysis. Effective segmentation is objective-driven. Segmentation analysis can be performed based on one or more of the following objectives. ● ● ● ● ● Understanding the audience behaviors Understanding the audience needs Understanding the audience values Understanding the audience product ownership Understanding the audience satisfaction or pain points. The four data mining techniques presented in the chapter are cluster analysis, CART, CHAID, and discriminant analysis. Cluster analysis is an unsupervised technique that examines the interdependence of a group of independent variables with no regard for a dependent variable. This technique is used when there is more than a single segmentation objective, such as understanding both the demographics and the behaviors of an audience. In this case, both objectives are important and there is no single designated dependent variable. In contrast, discriminant analysis and decision tree approaches such CHAID and CART, are supervised techniques where there is a predetermined dependent variable. These techniques are used when there is only one objective. For example, a segmentation study aiming at differentiating the values of different customers is better addressed by a decision tree analysis where customer value is the dependent variable. ■ Case study one: behavior and demographics segmentation Travel services firm Travel Wind conducts a survey to collect customer demographic and behavioral data. Travel Wind is offering basic travel services such as flight tickets and hotel bookings. The company is considering selling more upscale vacation packages to both domestic and foreign travelers. The survey intends to gauge the potential of such a premier offering. Travel Wind conducts a study to segment the customers based on their proclivity to domestic or foreign travel, their Internet and cell phone usage, and their demographic profiles. Travel Wind wants to use the segmentation results to drive effective creative and marketing messaging, as well as to select appropriate marketing channels for launching the new vacation package offering. 196 Data Mining and Market Intelligence The Travel Wind survey collects eight attributes from 4902 customers. Out of the 4902 customers, 4897 provide information on all the eight attributes. All attributes except age have a nonmetric data type. The eight attributes and their data types are: ● ● ● ● ● ● ● ● Age (metric) Household income (ordinal) Child presence (binary) Education (categorical) Internet usage (ordinal) Cellular phone usage (ordinal) Interest in US domestic travel (binary) Interest in foreign travel (binary). Travel Wind applies hierarchical agglomerative clustering to segment the 4897 members who answer all the survey questions. The technique initially treats every member as a cluster and then groups members based on inter-member distances. The data is divided into two sets, one for modeling building (training) and the other for validation. Around 70% of the data is randomly selected for model building and the remaining data is used for validation. Travel Wind uses the two-step cluster module of SPSS for the analysis. The SPSS proprietary module enables Travel Wind to analyze data of mixed data types (metric and nonmetric). The first step of the SPSS twostep analysis is a pre-cluster process that groups members into small subclusters. The second step of the SPSS two-step cluster analysis combines these small sub-clusters into larger clusters. The two-step cluster module uses the Akaike’s Information Criterion (AIC) statistic (Akaike 1974) to determine robustness of the model and the optimal number of clusters within the model. Lower AIC values indicate a more desirable configuration of model parameters. When a decrease in AIC becomes negligent, the clustering application stops and the analysis is completed. Model building Table 8-1 shows the SPSS output based on the model building data set. The decrease in AIC value becomes negligible after the first two clusters are created. Out of 3410 members used for model building, five of them cannot be classified into either cluster. The net number of members in the two clusters is 3405. As illustrated in Table 8-2, over half of the members are classified into cluster two. Cluster one consists of 32% of the sample while cluster two consists of 68% of the sample. Table 8-3 shows that the average age of household head in cluster one (68 years old) is higher than that in cluster two (51 years old). The overall average age of the members is 56. Audience Segmentation Table 8-1 AIC statistic for determining optimal number of clusters in case study one (model building stage) Number of clusters Akaike’s Information Criterion (AIC) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 58129.304 52521.868 50432.398 48319.882 47094.284 45883.723 44838.798 44048.546 43260.541 42544.896 41886.134 41262.019 40658.209 40131.944 39657.293 AIC change (a) Ratio of AIC changes (b) Ratio of distance measures (c) 5607.436 2089.470 2112.515 1225.598 1210.562 1044.925 790.251 788.006 715.645 658.761 624.115 603.809 526.266 474.650 1.000 0.373 0.377 0.219 0.216 0.186 0.141 0.141 0.128 0.117 0.111 0.108 0.094 0.085 2.629 0.989 1.685 1.012 1.149 1.296 1.003 1.092 1.078 1.050 1.030 1.130 1.095 1.038 (a) The changes are from the previous number of clusters in the table. (b) The ratios of changes are relative to the change for the two-cluster solution. (c) The ratios of distance measures are based on the current number of clusters against the previous number of clusters. (d) Approximately 70% of cases (SAMPLE) Selected. Table 8-2 Cluster breakdown Cluster 1 Cluster 2 N Fraction of combined 1095 2310 32.3% 67.8% The statistics in Table 8-4 indicate that the members in cluster two are better educated than those in cluster one. In cluster two, 44% of the members have a graduate school degree compared to 24% in cluster one. Table 8-5 shows that the percentage of members with children in the household is much higher in cluster two (44%) than in cluster one (11%). 197 198 Data Mining and Market Intelligence Table 8-3 Mean and standard deviation of the age of household head by cluster Mean of age Standard deviation of age 67.75 50.89 56.31 13.967 9.652 13.709 Cluster 1 Cluster 2 Combined Table 8-4 Education level distribution by cluster High school College Graduate school Vocational school Freq. Cluster Freq. Cluster Freq. Cluster Freq. Cluster percent percent percent percent Cluster 1 Cluster 2 Combined 402 366 768 36.7 15.8 22.6 423 914 1337 38.6 39.6 39.3 262 1011 1273 23.9 43.8 37.4 8 19 27 0.8 0.8 0.7 Table 8-5 Presence of children in household by cluster Unknown Cluster 1 Cluster 2 Combined Without children With at least one child Freq. Cluster percent Freq. Cluster percent Freq. Cluster percent 138 85 223 12.6 3.7 6.6 833 1217 2050 76.1 52.7 60.2 124 1008 1132 11.3 43.6 33.3 Table 8-6 shows that members in cluster two are wealthier than those in cluster one. Of the cluster two members, 71% have an annual household income of over $100,000 while only 19% of the cluster one members have the same household income. Audience Segmentation Table 8-6 Household income distribution by cluster Less than $50 K Cluster 1 Cluster 2 Combined $50 K–$99,999 $100 K or higher Freq. Cluster percent Freq. Cluster percent Freq. Cluster percent 301 9 310 27.5 0.4 9.1 586 652 1238 53.5 28.2 36.4 208 1649 1857 19.0 71.4 54.5 With regard to travel proclivity, Table 8-7 shows that cluster one has a higher percentage of frequent foreign travelers (63%) than cluster two (52%). There is no significant difference in domestic travel proclivity between cluster one (79%) and cluster two (74%), as shown in Table 8-8. Table 8-7 Foreign travel proclivity by cluster Infrequent traveler Cluster 1 Cluster 2 Combined Frequent traveler Freq. Cluster percent Freq. Cluster percent 410 1114 1524 37.4 48.2 44.8 685 1196 1881 62.6 51.8 55.2 Table 8-8 Domestic travel proclivity by cluster Cluster 1 Cluster 2 Combined Infrequent traveler Frequent traveler Freq. Cluster percent Freq. Cluster percent 230 607 837 21.0 26.3 24.6 865 1703 2568 79.0 73.7 75.4 199 200 Data Mining and Market Intelligence In terms of Internet and cell phone usage, cluster two shows a much heavier usage of Internet and cell phone than cluster one (Table 8-9 and Table 8-10). Table 8-9 Internet usage by cluster Heavy user Cluster 1 Cluster 2 Combined Medium user Light user Freq. Cluster percent Freq. Cluster percent Freq. Cluster percent 265 1730 1995 24.2 74.9 58.6 255 401 656 23.4 17.4 25.1 575 179 754 52.5 7.8 22.1 Table 8-10 Cell phone usage by cluster Heavy user Cluster 1 Cluster 2 Combined Medium user Light user Freq. Cluster percent Freq. Cluster percent Freq. Cluster percent 559 2282 2841 51.1 98.8 83.4 430 28 458 39.3 1.2 13.5 106 0 106 9.6 0.0 3.1 Table 8-11 summarizes the characteristics of the two clusters. We can describe cluster one as the ‘Mass’ cluster and cluster two as the ‘Upscale’ cluster. Travel Wind can utilize the clustering analysis results and the following insight for effective targeting marketing. ● ● There are significant opportunities for selling travel vacation packages to the existing member base since over half of its members are either frequent foreign or domestic travelers. Travel Wind may consider launching two different sets of creative and messaging to the two distinct clusters based on income and age information. The members from the upscale cluster (cluster two) are likely to be either professionals at the peak of their careers or wealthy homemakers. The members of the mass cluster (cluster one) are likely to be retirees looking for a unique travel experience. Audience Segmentation Table 8-11 Summary of cluster characteristics for case study one (model building stage) Cluster number and description Demographics Travel propensity Internet and cell phone usage Cluster 1 (Mass) Education: High concentration of high school and college Frequent foreign travelers: 63% Internet usage: Light Household income: Medium Frequent domestic travelers: 79% Cell phone usage: High to medium Average age: 68 Presence of children: Low Cluster 2 (Upscale) Education: High concentration of college and graduate schools Household income: High Frequent foreign travelers: 52% Frequent domestic travelers: 74% Internet usage: High Cell phone usage: High Average age: 51 Presence of children: Medium ● ● In terms of vacation package pricing, Travel Wind may offer higherpriced packages to the upscale cluster and less pricey packages to the mass cluster. Travel Wind can leverage Internet and mobile marketing channels to reach the upscale cluster given that the members of this cluster tend to be heavy users of the two technologies. Model validation Travel Wind needs to validate the cluster model with the remaining 30% of the sample (1492 members). Three out of the 1492 records could not be 201 202 Data Mining and Market Intelligence classified into either cluster and are dropped out from the analysis. The net number of members classified into either cluster is 1489. The validation result is very similar to the result from model building. Therefore, we conclude that the clusters are very stable and robust. Table 8-12 shows the AIC statistic of the validation set. Table 8-12 AIC criterion for determining the optimal number of clusters for case study one (model validation stage) Number of clusters Akaike’s Information Criterion (AIC) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 25538.570 22987.475 21914.764 21156.948 20497.376 19859.790 19363.405 19042.244 18744.128 18447.974 18166.176 17888.133 17676.206 17484.113 17314.920 AIC change (a) Ratio of AIC changes (b) Ratio of distance measures (c) 2551.095 1072.711 757.816 659.572 637.586 496.385 321.161 298.115 296.154 281.798 278.044 211.927 192.093 169.193 1.000 0.420 0.297 0.259 0.250 0.195 0.126 0.117 0.116 0.110 0.109 0.083 0.075 0.066 2.291 1.379 1.134 1.031 1.248 1.446 1.062 1.005 1.041 1.011 1.233 1.075 1.095 1.002 (a) The changes are from the previous number of clusters in the table. (b) The ratios of changes are relative to the change for the two-cluster solution. (c) The ratios of distance measures are based on the current number of clusters against the previous number of clusters. (d) Approximately 30% of cases (SAMPLE) Selected. Table 8-13 shows that 37% of the members are classified in cluster one and the remaining 63% of the members are classified in cluster two. The breakdown is similar to that of the modeling set. The statistics in Table 8-14 indicate that the average household head age in cluster one (67 years old) is higher than that in cluster two (51 years old). The overall average age of the member base is 56. Table 8-15 presents the statistics of education level by cluster. Members in cluster two are better educated than in cluster one since 45% of the Audience Segmentation Table 8-13 Cluster breakdown Cluster 1 Cluster 2 N % of Combined 557 932 37.4 62.6 Table 8-14 Average household head age by cluster Cluster 1 Cluster 2 Combined Mean of age Standard deviation of age 66.65 50.53 56.56 13.909 9.484 13.764 Table 8-15 Education level distribution by cluster High school College Graduate degree Vocational school Freq. Cluster Freq. Cluster Freq. Cluster Freq. Cluster percent percent percent percent Cluster 1 184 Cluster 2 153 Combined 336 33.0 16.4 22.6 231 356 587 41.5 38.2 39.4 135 415 550 24.2 44.5 36.9 7 8 15 1.3 0.9 1.0 cluster two members have a graduate school degree compared to 24% in cluster one. The percentage of members with children in the household is much higher in cluster two (46%) than in cluster one (14%) as shown in Table 8-16. Based on Table 8-17, cluster two is a wealthier group than cluster one. Among cluster two members, 73% have an annual household income of over $100,000 while only 24% of the cluster one members have the same household income. In terms of travel proclivity, Table 8-19 illustrates that cluster one has a slightly higher percentage of frequent foreign travelers (58%) than 203 204 Data Mining and Market Intelligence Table 8-16 Presence of children in household Unknown Cluster 1 Cluster 2 Combined Without children With at least one child Freq. Cluster percent Freq. Cluster percent Freq. Cluster percent 77 25 102 13.8 2.7 6.9 400 479 879 71.8 51.4 59.0 80 428 508 14.4 45.9 34.1 Table 8-17 Household income distribution by cluster Less than $50 K Cluster 1 Cluster 2 Combined $50 K–$99,999 $100 K or higher Freq. Cluster percent Freq. Cluster percent Freq. Cluster percent 123 3 126 22.1 0.3 8.5 302 253 555 54.2 27.2 37.3 132 676 808 23.7 72.5 54.3 Table 8-18 Domestic travel behavior by cluster Infrequent traveler Cluster 1 Cluster 2 Combined Frequent traveler Freq. Cluster percent Freq. Cluster percent 138 223 361 24.8 23.9 24.2 419 709 1128 75.2 76.1 75.8 cluster two (55%). According to Table 8-18, there is no significant difference in domestic travel proclivity between cluster one (75%) and cluster two (76%). With regard to Internet and cell phone usage (Table 8-20 and Table 8-21), cluster two shows heavier use of these technologies than cluster one. Table 8-22 summarizes the characteristics of the two clusters, cluster number and description. Audience Segmentation Table 8-19 Foreign travel behavior by cluster Infrequent traveler Cluster 1 Cluster 2 Combined Frequent traveler Freq. Cluster percent Freq. Cluster percent 237 416 653 42.6 44.6 43.9 320 516 836 57.5 55.4 56.2 Table 8-20 Internet usage by cluster Cluster 1 Cluster 2 Combined Freq. Cluster percent Freq. Cluster percent Freq. Cluster percent 160 718 878 28.7 77.0 59.0 133 150 283 23.9 16.1 19.0 264 64 328 47.4 6.9 22.0 Table 8-21 Cell phone usage by cluster Heavy user Cluster 1 Cluster 2 Combined Medium user Light user Freq. Cluster percent Freq. Cluster percent Freq. Cluster percent 302 931 1233 54.2 99.9 82.8 205 1 206 36.8 0.1 13.8 50 0 50 9.0 0.0 3.4 The results of the model validation phase are similar to the results of the model building phase. This is an indication that the model is fairly robust and stable. ■ Case study two: value segmentation E-commerce firm Global Village needs to analyze its existing customer base to better understand how customer value is distributed across the 205 206 Data Mining and Market Intelligence Table 8-22 Summary of cluster characteristics for case study one (model validation stage) Cluster number and description Demographics Travel propensity Internet and cell phone usage Cluster 1 (Mass) Education: High concentration of high school and college Frequent foreign travelers: 58% Internet usage: Light Household income: Medium Frequent domestic travelers: 75% Cell phone usage: High to medium Average age: 67 Presence of children: Low Cluster 2 (Upscale) Education: High concentration of college and graduate schools Household income: High Frequent foreign travelers: 55% Internet usage: High Cell phone usage: High Frequent domestic travelers: 76% Average age: 51 Presence of children: Medium customer base. Customer value is defined as the customer ’s annual purchase dollar amount of Global Village products and is the dependent variable for the analysis. There are four independent variables in the analysis: household income, age, gender, and marital status. A total of 60,510 customers are included in the analysis. Seventy percent of the data (42,315 customers) is used for model building and 30% (18,195 customers) is used for validation. The C & RT module of the SPSS software is used for the analysis. Based on the CART technique introduced in Chapter 7, C & RT partitions a group of members to create homogeneous subgroups by maximizing a Audience Segmentation metric called ‘improvement’ at each split. This metric is defined by the decrease in least squares (LS) from the parent node to its child nodes divided by the total number of members in the data set. Model building There are 42,315 customers in the model building data set. Household income is identified as an independent variable that is highly associated with purchases. Five clusters are created as a result of the analysis, as illustrated in Figure 8-1. The higher the household income of a customer, the more likely he is to purchase from Global Village. Node 0 Total: 42,315 Mean annual purchase Amount: $1326.8 Improvement 1,763,200 Income $200K Node 1 Total: 35,533 Mean annual purchase Amount: $746.7 Income $200K Node 2 Total: 6782 Mean annual purchase Amount: $4366.2 Improvement 87,091 Income $100K Income Improvement 7383 Improvement 21,856 Income $75K Node 5 Total: 16,156 Mean annual purchase Amount: $394.7 $100K Node 4 Total: 10,602 Mean annual purchase Amount: $1240.5 Node 3 Total: 24,931 Mean annual purchase Amount: $536.7 Income Node 6 Total: 8775 Mean annual purchase Amount: $798.0 $75K Income $150K Node 7 Total: 5851 Mean annual purchase Amount: $1085.8 Income $150K Node 8 Total: 4751 Mean annual purchase Amount: $1431.0 Figure 8-1 Decision tree output for case study two (model building stage). To demonstrate how the metric ‘improvement’ is derived, we focus on node zero as a parent node and nodes one and two as its child nodes. LS in node zero is i= 42 , 315 ∑ i=1 (Xi 1326.793)2 = 5, 250 , 280 , 769, 315 207 208 Data Mining and Market Intelligence LS in node two is i=6782 ∑ i=1 (Xi 4366.209)2 = 4 , 762, 754 , 569, 013 LS in node one is i= 35 , 533 ∑ i=1 (Xi 746.676)2 = 412, 915, 613 , 804 The difference in LS between node zero and node one and node two combined is 5, 250 , 280 , 769, 315 − ( 4 , 762, 754 , 569, 013 + 412, 915, 613 , 804) = 74 , 610 , 586 , 498 Improvement as the average 74 , 610 , 586 , 498 LS = = 1, 763 , 218 42, 315 of the difference in The SPSS C & RT module continues to create nodes with maximum improvement in LS and the resulting output is illustrated in Figure 8-1. Model validation There are 18,195 customers in the model validation data set. Figure 8-2 shows the results of the validation analysis. The validation results are consistent with the model results, which suggest that the segmentation is fairly stable and robust. ■ Case study three: response behavior segmentation Direct marketing firm Wonder Electronics has just completed a direct mail campaign and wants to segment the target audience based on their responsiveness. Wonder Electronics will leverage the analysis for its future customer acquisition programs. In addition to response data, four other attributes are collected and used in the analysis: marital status, presence of children, household income, and dwelling type (single family homes or multiple-unit buildings). A total of 10,435 prospects are included in the analysis. Seventy percent of the data (7279 prospects) is used for model building and 30% (3156 prospects) is used for validation. Audience Segmentation Node 0 Total: 18,195 Mean annual purchase Amount: $1326.8 Income $200K Node 1 Total: 15,274 Mean annual purchase Amount: $758.5 Income Income $100K Income Income $75K $100K Node 4 Total: 4462 Mean annual purchase Amount: $1208.3 Node 3 Total: 10,812 Mean annual purchase Amount: $572.8 Node 5 Total: 7086 Mean annual purchase Amount: $408.4 $200K Node 2 Total: 2921 Mean annual purchase Amount: $4039.0 Income Node 6 Total: 3726 Mean annual purchase Amount: $885.5 $75K Income $150K Node 7 Total: 2448 Mean annual purchase Amount: $1057.7 Income $150K Node 8 Total: 2014 Mean annual purchase Amount: $1391.4 Figure 8-2 Decision tree output for case study two (model validation stage). The analysis is conducted with the CHAID module of the SPSS Classification Trees software. CHAID uses the chi square statistics. The significance level (p value) is used to determine if a split of a node in the tree is statistically significant. In the current example, the dependent variable (responsiveness to the direct marketing campaign) has a nonmetric (more specifically, binary) data type. Therefore, the chi square statistics is used to determine the statistical significance of splits. Model building Figure 8-3 shows the SPSS output of the tree corresponding to the model building set. Three clusters are created and the overall response rate is 7.8%. The cluster (node two) with nonsingle prospects (married or with unknown marital status) have the highest response rate (13.9%), followed by the cluster (node four) of single prospects without children (response 209 210 Data Mining and Market Intelligence rate 8.8%), and the cluster (node three) of single prospects with children (response rate 5.8%.) Node 0 Total: 7279 # Responders: 566 Response rate: 7.8% Chi square 36.3, df1 1, p 0.00 Single Marital status Married or unknown Node 1 Total: 6646 # Responders: 478 Response rate: 7.2% Node 2 Total: 633 # Responders: 88 Response rate: 13.9% Chi square 21.7, df1 1, p 0.00 With children Node 3 Total: 3531 # Responders: 205 Response rate: 5.8% Without children Node 4 Total: 3115 # Responders: 273 Response rate: 8.8% Figure 8.3 CHAID output for case study three (model building stage). Validation The result of the validation set is consistent with that of the model building set, which suggests that the CHAID segmentation model is stable and robust (Figure 8-4). ■ Case study four: customer satisfaction segmentation Consumer electronics manufacturer Versatile Electronics conducts a customer satisfaction survey to collect overall satisfaction ratings on the various features of its latest camcorder model. There are four rating levels: four (excellent), three (good), two (fair), and one (poor). Data on a total of 14,951 responses is collected. Seventy percent (10,416 responders) of the data is used to build a discriminant model and the Audience Segmentation Node 0 Total: 3156 # Responders: 257 Response rate: 8.1% Single Marital status Married or unknown Node 1 Total: 2872 # Responders: 218 Response rate: 7.6% With children Node 3 Total: 1549 # Responders: 103 Response rate: 6.7% Node 2 Total: 284 # Responders: 39 Response rate: 13.7% Without children Node 4 Total: 1323 # Responders: 115 Response rate: 8.7% Figure 8-4 CHAID output for case study three (model validation stage). remaining 30% (4535 responders) of the data is used for validation. Every respondent is asked to rate his overall satisfaction of the product and his level of satisfaction of the following 11 attributes. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. Quality Price Service Feature A Feature B Feature C Feature D Feature E Feature F Feature G Feature H The overall satisfaction rating of the product is the dependent variable. The satisfaction ratings on the 11 attributes are the discriminating variables. Discriminant analysis will derive the discriminant functions, represented by linear combinations of the 11 discriminating variables. 211 212 Data Mining and Market Intelligence Model building A sample of 10,416 responses is used for model building. Among them, 3158 are dropped due to incomplete data; as a result the net number of responders included in the analysis is reduced to 7258. Each of these respondents belongs to one of the four groups depending on his level of overall satisfaction of the product, excellent, good, fair, and poor. The F statistics illustrated in Table 8-23 are statistically significant (p value 0.05), which indicates that there are differences in the means of satisfaction rating of the 11 attributes between the four groups. Therefore, all the 11 attributes are selected as the discriminating variables for constructing discriminant functions. Table 8-23 Equality test of the satisfaction ratings group means for the 11 attributes in case study two (model building stage) Feature rated Wilks’ lambda F df 1 df 2 Sig. Quality Price Feature A Feature B Service Feature C Feature D Feature E Feature F Feature G Feature H 0.565 0.746 0.628 0.596 0.562 0.623 0.607 0.617 0.598 0.612 0.806 1859.020 824.277 1430.819 1636.679 1880.780 1465.850 1568.744 1498.611 1625.951 1530.941 583.153 3 3 3 3 3 3 3 3 3 3 3 7254 7254 7254 7254 7254 7254 7254 7254 7254 7254 7254 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 We create three discriminant functions for differentiating the four groups with different levels of overall product satisfaction, based on Wilks’ lambda (Dillon and Goldstein 1984) in Table 8-24. Table 8-24 Statistical significance of discriminant functions Test of function(s) Wilks’ lambda Chi-square df Sig. 1–3 2–3 3 0.327 0.958 0.987 8111.441 312.259 92.618 33 20 9 0.000 0.000 0.000 Audience Segmentation Table 8-25 presents the three corresponding eigenvalues, which quantify the percent of total variance explained by particular discriminant functions. Table 8-25 Discriminant functions and their corresponding eigenvalues Function(s) Eigenvalue % of Variance Cumulative % Canonical correlation 1 2 3 1.932 0.031 0.013 97.8 1.6 0.7 97.8 99.3 100.0 0.812 0.173 0.113 Table 8-26 presents the discriminant coefficients associated with each discriminant function and discriminating variable. Table 8-26 Discriminant functions, discriminating variables, and discri- minant coefficients Feature rated Function 1 Function 2 Function 3 Quality Price Feature A Feature B Service Feature C Feature D Feature E Feature F Feature G Feature H 0.311 0.042 0.189 0.159 0.169 0.093 0.093 0.108 0.237 0.245 0.106 0.055 0.160 0.139 0.046 0.161 0.225 0.020 0.063 0.143 1.041 0.331 0.433 0.059 0.629 0.085 0.438 0.018 0.516 0.070 0.507 0.162 0.387 We can visualize the distribution of the group members in a twodimensional plot with the first two discriminant functions as its axes in Figure 8-5. Validation A sample of 4535 responses is used for validation. Among them, 1372 are dropped due to incomplete data so the net number of responses included 213 Data Mining and Market Intelligence 8 6 4 Function 2 214 2 Poor Fair 0 Good Excellent 0 2 2 4 8 6 4 2 Function 1 Rating_overall Poor Fair Good Excellent Group centroid Figure 8-5 Group distribution by discriminant functions for case study four (model building stage). in the validation analysis is 3163. As Table 8-27 shows, the means of the 11 attributes are statistically significant between the four groups. Table 8-28 demonstrates the statistical significance of three discriminant functions. We can see that the results are similar to those from the model building step. Therefore, we conclude that the discriminant analysis is both stable and robust. The eigenvalues corresponding to the discriminant functions quantify the percentage of total variance explained by these discriminant functions as illustrated by Table 8-29. The discriminant coefficients associated with the 11 discriminating variables for the top three discriminant functions are illustrated in Table 8-30. Audience Segmentation Table 8-27 Equality test of the satisfaction ratings group means for the 11 attributes in case study two (model validation stage.) Feature rated Wilks’ lambda F df 1 df 2 Sig. Quality Price Feature A Feature B Service Feature C Feature D Feature E Feature F Feature G Feature H 0.543 0.712 0.617 0.564 0.523 0.577 0.577 0.584 0.601 0.604 0.812 887.015 425.810 653.768 813.871 959.846 772.547 770.958 750.739 698.160 691.480 243.601 3 3 3 3 3 3 3 3 3 3 3 3159 3159 3159 3159 3159 3159 3159 3159 3159 3159 3159 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Table 8-28 Statistical significance of discriminant functions Test of function(s) Wilks’ lambda Chi-square df Sig. 1–3 2–3 3 0.299 0.943 0.989 3812.626 184.589 36.388 33 20 9 0.000 0.000 0.000 Table 8-29 Discriminant functions and the corresponding eigenvalues Function(s) Eigenvalue % of Variance Cumulative % Canonical correlation 1 2 3 2.159 0.048 0.012 97.3 2.2 0.5 97.3 99.5 100.0 0.827 0.214 0.107 By plotting the group members on a two-dimensional plot with the first two discriminant functions as its two axes, we can see the clear distinction between the four groups in Figure 8-6. 215 Table 8-30 Discriminant functions, discriminating variables, and dis- criminant coefficients Feature rated Function 1 Function 2 Function 3 Quality Price Feature A Feature B Service Feature C Feature D Feature E Feature F Feature G Feature H 0.307 0.081 0.158 0.166 0.208 0.120 0.095 0.104 0.188 0.255 0.092 0.367 0.034 0.203 0.298 0.025 0.089 0.033 0.084 0.049 0.839 0.053 0.195 0.096 0.684 0.080 0.440 0.373 0.378 0.140 0.525 0.299 0.006 Function 2 2.5 0.0 Poor Good Fair Excellent 2.5 5.0 7.5 5.0 2.5 0.0 2.5 Function 1 Rating_overall Poor Fair Good Excellent Group centroid Figure 8-6 Group distribution by discriminant functions for case study four (model validation stage). Audience Segmentation The result of the validation set is consistent with that of the model building set, which suggests that the discriminant model is stable and robust. ■ References Dillon, W., and M. Goldstein. Multivariate Analysis – Methods and Applications. John Wiley & Sons, New York, 1984. Akaike, H. A new look at the statistical model identification. IEEE Transactions on Automatic Control, New York, 19(6): 716–723, 1974. 217 This page intentionally left blank CHAPTER 9 Data Mining for Customer Acquisition, Retention, and Growth This page intentionally left blank This chapter presents three case studies that demonstrate the application of data mining techniques to acquiring new customers as well as growing and retaining existing customers. We apply to these case studies logistic regression, a technique that can be implemented by a variety of tools such as SAS, IBM Intelligent Miner, Knowledge STUDIO, and SPSS. ■ Case study one: direct mail targeting for new customer acquisition Catalog marketer Mountaineer needs to mail its new holiday catalogs to a list of prospects. The firm plans to analyze the result of its last holiday mailing and apply that learning to the upcoming mailing. The last catalog mailing by Mountaineer was made to a total of 52,000 prospects, which resulted in a purchase rate of 1.2%, or 624 purchases. At the time of the last mailing, Mountaineer also defined a control group of 52,000 prospects that did not receive a catalog. The control group yielded a purchase rate of 0.5%, 260 purchases. The last catalog generated a lift of 140% in purchase rate from 0.5% to 1.2%. There are four distinct segments in the mailing group and the control group combined. ● ● ● ● Prospects who made a purchase after receiving a catalog Prospects who did not make a purchase after receiving a catalog Prospects who made a purchase without receiving a catalog Prospects who did not receive a catalog and did not make a purchase. Obviously, some prospects made a purchase regardless of whether they received a catalog. In contrast, some prospects made a purchase only after receiving a catalog. Two probabilities need to be calibrated to fully understand a prospect’s purchase behavior. These two probabilities are the probability of making a purchase after receiving a catalog, and the probability of making a purchase without receiving a catalog. The most ideal prospects for targeting are those with a very low probability of making a purchase when not receiving a catalog, but with a very high probability of making a purchase after receiving a catalog. The objective of a targeting model is to identify and select prospects with the highest difference in these two probabilities. This train of thought leads to a twomodel approach, one model based on a sample of prospects not receiving a catalog, and the other model based on a sample of prospects receiving 222 Data Mining and Market Intelligence a catalog. This approach is referred to as the incremental purchase modeling approach. Models created with this approach follow the standard model building process comprising building, validation, and test phases. For simplicity, this chapter only presents the final results of the models without further elaboration on the validation and test phases. Purchase model on prospects having received a catalog Mountaineer utilizes the purchase data of 5790 prospects targeted by the last catalog for building a purchase model. Data of 4053 prospects (70% of the data) is used for modeling and the remaining 30% used for model validation. The dependent variable is a purchase flag indicating the existence of any purchase by a particular prospect. The value of the purchase flag is one, if any purchase has happened, and zero otherwise. Four prospect attributes are used as the independent variables. ● ● ● ● Age: The five age groups are under 18, 18–24, 25–34, 35–44, and 45–54. Household income: There are six income groups: $0–50 K, $50 –75 K, $75 –100 K, $100 –125 K, $125 –150 K, and over $150 K. Home ownership: The variable has a binary value. A value of one indicates home ownership and zero lack of home ownership. Not interest in outdoors: Prospects who are interested in outdoors have a value of zero and prospects not interested in outdoors have a value of one for this variable. The independent variables are all categorical variables and therefore they need to be transformed into binary variables prior to being fed into a logistic regression model. The purchase model is built with the backward stepwise logistic regression method. The backward stepwise logistic regression method employs a combination of the backward removal and the forward entry methods (Neter, Wasserman, and Kutner 1990). In the analysis of Mountaineer data, the criterion for variables to exit the model in the backward removal phase is 0.10 and to enter the model in the forward entry phase 0.05 respectively. Table 9-1 shows that three out of the four independent variables are predictive of purchase behavior. These three predictive independent variables are age, household income, and interest in the outdoors. Prospects with the following characteristic are more likely to make a purchase when receiving a catalog. Data Mining for Customer Acquisition, Retention Table 9-1 Model parameter estimates for case study one (model build- ing stage.) Parameter estimate Intercept Dummy variable Age: 18–24 Age: Under 18 Age: 35–44 Age: 25–34 Income: $75–100 K Income: $50–75 K Income: $100–125 K Income: Less than $50 K Income: $125–150 K Not interested in outdoors p Value Exponential of parameter estimate 1.885 0.000 6.584 1.304 0.191 0.115 0.588 0.246 0.896 0.511 2.437 0.743 1.057 0.000 0.332 0.429 0.000 0.106 0.000 0.023 0.000 0.000 0.000 0.271 0.826 1.122 0.556 0.782 0.408 0.600 0.087 0.475 0.348 ● Age between 35 and 44 The probability of a prospect making a purchase after receiving a catalog is p exp(L) 1 exp(L) (9.1) where L 1.885 1.304 Age18−24 0.191 Ageunder 18 0 .115 Age35−44 0.588 Age 25 – 34 0.246 Income75 – 100 K 0.896 Income50 – 75 K 0.511 I ncome100−125 K 2.437 Income 50 K 0.743 Income125−150 K 1.057 Outdoors no interest The value of each of the variables on the right hand side of the equation is one when the condition indicated by the subscript holds and zero otherwise. For instance the value of the variable Age18–24 is one when the age of a prospect is between 18 and 24, and zero otherwise. A positive parameter indicates a higher probability of making a purchase when the condition indicated by the subscript is true. Prospects aged between 35 and 44 are more likely to make a purchase than those aged over 44 or under 223 224 Data Mining and Market Intelligence 35 because the parameter associated with the 35–44 age group is positive. The estimated probability that a prospect aged between 18 and 24, with a household income between $75 K and $100 K, and with an interest in outdoors makes a purchase after receiving a catalog is pcatalog exp(1.885 1.304 0.246) 0.5825 1 exp(1.885 1.304 0.246) (9.2) When a purchase model is used to predict purchase behavior, it is a common practice to classify a prospect as a potential buyer if his probability of purchase is greater than 0.5 (50%) and as a potential non-buyer if his probability of purchase is less than 0.5 (50%). A probability of 0.5 indicates uncertain purchase behavior. Thus, the model predicts that a prospect aged between 18 and 24, with a household income between $75 K and $100 K, and with an interest in the outdoors is likely to make a purchase when targeted by a catalog. Purchase model based on prospects not having received a catalog We next build the second purchase model by utilizing a sample of 9760 prospects who did not receive Mountaineer ’s last catalog. The same independent and dependent variables used in the previous section are used in building the second purchase model. We conduct the analysis with the backward stepwise option of the logistic regression module of SPSS. The statistical significance criterion for variables to exit the model and to entry the model is 0.10 and 0.05. Table 9-2 shows that all four independent variables are predictive of purchase behavior. Prospects with the following characteristics are more likely to make a purchase when not receiving a catalog. ● ● ● ● Age between 45 and 54 or under 18 Income less than $50K or between $75K and $125K With home ownership Not interested in outdoors. The probability of a prospect making a purchase without receiving a catalog is pno catalog exp(L) 1 exp(L) (9.3) Data Mining for Customer Acquisition, Retention Table 9-2 Model parameter estimates for case study one (model valida- tion stage.) Intercept Dummy variable Age 45–54 Age 35–44 Age 25–34 Age Under 18 Income $75 –100 K Income $50 –75 K Income $100 –125 K Income Less than $50 K No home ownership Not interested in outdoors Parameter estimate p value Exponential of parameter estimate 2.934 0.000 0.053 0.766 0.074 0.100 0.520 0.734 0.327 1.416 0.974 0.411 2.441 0.004 0.780 0.717 0.074 0.000 0.047 0.000 0.011 0.000 0.000 2.152 1.077 0.905 1.682 2.084 1.387 4.122 2.648 0.663 11.487 with L 2.934 0.766 Age 45−54 0.074 Age35−44 0.100 Age25−34 0.520 Ageunder 18 0.734 Income75−100 K 0.327 Income50−75 K 1.416 × Income100−125 K 0.974 Income 50 K 0.411 Homeno home 2.441 Ou tdoors no interest As an application of this model estimation, consider the probability that a prospect aged between 18 and 24, with a household income between $75 K and $100 K, and with an interest in outdoors makes a purchase without receiving a catalog. pno catalog exp(2.934 0.734) 0.0998 1 exp(2.934 0.734) (9.4) Based on Eqs. (9.2) and (9.4), we conclude that prospects aged between 18 and 24, with a household income between $75 K and $100 K, and with an interest in outdoors are not likely to make a purchase without receiving a catalog but become likely to make a purchase after receiving a catalog. 225 226 Data Mining and Market Intelligence Prospect scoring So far Mountaineer has built two targeting purchase models. The first model is used to predict the probability of purchase if targeted by a catalog. The second model is used to predict probability of purchase when not mailed a catalog. Mountaineer will apply these two models to select its targeted mailing list. Mountaineer acquires a list of 100,000 potential prospects for the next catalog mailing and has a budget for targeting only 20,000 of them. The firm also plans to set aside a control group of 20,000 prospects that will not receive a catalog. To effectively select the prospect target list, Mountaineer undertakes the following steps. 1. Randomly select 20,000 prospects for the control group. 2. Score the remaining 80,000 prospects with the first purchase model to compute their expected purchase probability if targeted by a catalog. These prospects then are classified as buyers and non-buyers depending on whether their purchase probability exceeds the 0.5 threshold. 3. Score the same 80,000 prospects with the second purchase model to calculate their probability of making a purchase if not targeted by a catalog. The same probability threshold of 0.5 is used to classify the prospects as buyers and non-buyers when not targeted. 4. Select the prospects predicted to be buyers by the first purchase model and to be non-buyers by the second purchase model. 5. Select 20,000 prospects resulted from the previous step with the highest probabilities of making a purchase when targeted by a catalog. Modeling financial impact This section illustrates the financial impact of purchase models based on the following assumptions of Mountaineer. ● ● ● ● Cost per catalog (cost of catalog production and mailing postage): $0.5 Average purchase amount per buyer: $80 Average cost of goods and shipment: $50 Expected purchase rate of the mailing of 20,000 prospects: 3.0% The net revenue from the mailing group is equal to Total sales Total costs $80 (20 , 000 3%) $50 (20 , 000 3%) $0.5 20 , 000 $8 , 000 Data Mining for Customer Acquisition, Retention The return on investment of the mailing group is Net revenue $8000 20% Total cost $40 , 000 By applying purchase models to improve the purchase rate of the mailing group, Mountaineer can generate a 20% return on investment. ■ Case study two: attrition modeling for customer retention Credit lender UNC Bank has lately observed a high attrition rate among its small business customers. UNC plans to build an attrition model to predict future attrition behavior with the intention to timely detect customers who are likely to defect. UNC has collected six variables (company size, headquarters location, industry, number of years since establishment, annual sales volume, and number of offices) about its small business customers. These are the independent variables. The dependent variable is a binary attrition flag with a value of one indicating attrition, and zero otherwise. UNC utilizes the historical data of 47,056 customers for building an attrition model. Data on 32,939 prospects (70% of the data) is used for modeling and the remaining 30% used for model validation. The attrition model is created with the stepwise logistic regression module in SPSS. The statistical significance criterion for variables to exit the model and to entry the model is 0.10 and 0.05. Customers with the following characteristics are more prone to attrition. ● ● ● ● Customer headquarter located in states NV, PR, VI, AR, DC, AL, or FL Customers in one of the following industries: apparel, manufacturing, insurance, services, auto repair, restaurants, food catering, transportation, mortgage, or travel Company size (number of employees) smaller than 250 Company annual revenue between $10 M and $50 M. Based on a logistic regression analysis, the probability of a customer canceling his credit account is pattrition exp(L) 1 exp(L) 227 228 Data Mining and Market Intelligence The SPSS model output gives the following construction for L L 1.823 0.370 StateCA 0.281 StateAZ 0.321 StateCT 0.234 St a teOH 0.586 StateNM 0.157 StateNY 0.527 StateRI 0.005 Sta teGA 0.557 StateNV 0.439 StatePA 0.364 StatePA 0.045 Stat ePR 0.295 StateDE 0.216 State VA 0.360 StateWA 0.132 Sta t e VI 0.085 StateOR 0.115 StateAR 0.413 State DC 0.640 Stat eAL 1.324 StateFL 0.710 Industry apparel 0.029 Industry banking 0.217 Industry manufacturing 0.029 Industry insurance 0.018 Industry services 0.084 Industry auto repairs 0.286 Industry restaurant 0.054 Industry education 0.798 Indu stry food catering 0.123 Industry construction 0.153 Indu stry transportation 0.128 Industry mortgage 0.191 Industry travel 0.467 Industry legal 0.682 Number of years in busines s 0.049 Size 250 0.135 Size[100 , 249] 0.209 Size[ 50 , 99] 0.290 Size[ 20 , 49] 0.199 Size[1, 19] 0.3642 Regional 0.275 Nat ional 0.021 Sales($25 M ,$50 M] 0.118 Sales($10 M ,$25 M] 0.167 Sales($5 M ,$10 M] 0.531 Sales($1 M ,$5 M] 0.849 Sales $1 M (9.5) As an application of Eq. (9.5), consider a company in the apparel business is located in Florida, has a size of 250 employees, and has annual sales of $50 million. Replacing these values in Eq. (9.5), and assuming that the remaining dummy variables take on the value 0, we get the following probability of attrition pattrition exp(L) exp(1.823 1.324 0.710 0.049 0.0211) 0.98 1 exp(L) 1 exp(1.823 1.324 0.710 0.049 0.021) UNC bank can leverage the above attrition model to predict which customers are likely to cancel their credit lines. The bank can salvage these customers by offering them incentives such as favorable credit terms. Data Mining for Customer Acquisition, Retention ■ Case study three: customer growth model Companies can leverage customer growth models to increase sales to their existing customers. Customer growth models predict the probability that a customer will grow his purchase amount over time. In practice, most companies focus on customers likely to have significant growth rather than marginal growth in their purchase amount. A growth model is built on historical data where the dependent variable is a binary growth flag with a value of one or zero. A customer with a growth flag of one is a customer that grew his purchase amount above a pre-determined percentage during a specific time frame. This pre-determined percentage is set arbitrarily by a company. A customer with a growth flag of zero is one whose growth in purchase amount does not meet this pre-determined growth percentage. The following section discusses a case study where stepwise logistic regression is applied to build a customer growth model. Insurance company, Safe Net, plans to pilot a comprehensive insurance package that allows policyholders to combine several types of insurance policies (health, auto, life, accidental, and home) into one policy with a fixed premium. This new product will allow the company to minimize operational costs and to increase revenues by cross-selling additional insurance coverage to existing policyholders. This new product will also benefit policyholders in that it will make it easy for them to manage multiple policies and to enjoy potential discounts. In order to take advantage of the highest revenue potential, Safe Net decides to target those policyholders that are likely to grow their insurance purchases with the firm. With this objective in mind, Safe Net needs to build a growth model to predict each customer ’s probability of increasing his insurance purchase. Safe Net defines a growth policyholder as one that increases his insurance purchase by at least 5% over the past three years. The firm draws a random sample of 95,953 policyholders. Data on 70% of the policyholders, or 66,915 of them are used for model building and the remaining 30% used for validation. The following five policyholder attributes are used as independent variables. ● ● ● ● ● Family annual income Residence state Profession Number of members in the family Policyholder age. 229 230 Data Mining and Market Intelligence The analytic result shows that the probability of growth at policyholder level is pgrowth 5% exp(L) 1 exp(L) where L 1.770 2.292 StateCA 2.230 StateAZ 2.264 StateCT 2.161 S tateOH 2.121 StateNM 1.946 StateNY 1.997 StateRI 1.972 S tateGA 2.097 StateNV 2.132 StatePA 2.054 StateMA 2.115 S tatePR 2.175 StateDE 2.067 State VA 2.263 StateWA 2.075 S tate VI 2.157 StateOR 2.060 StateAR 2.287 StateDC 2.097 S tateAL 2.909 StateFL 0.152 Industry apparel 0.060 Industry banking 0.073 Industry manufacturing 0.033 Industry insurance 0.116 Industry services 0.042 Industry auto repairs 0.022 Industry restaurant 0.036 Industry education 0.041 I ndustry food catering 0.080 Industry construction 0.047 Indu stry transportation 0.047 Industry mortgage 0.076 Industry travel 0.253 Industry legal 0.146 Age35−44 0.086 Age18 0.004 Age18−24 0.017 Age 45−54 0.242 Household size1 0.486 Household size2 0.566 Household size3 0.209 Household size 4 0.016 Household size5 0.169 Household size6 (9.6) From Eq. (9.6), we conclude the following ● ● ● Geographically, policyholders in FL, DC, and CA are more likely to grow their insurance purchase than policyholders in the other states. Policyholders in apparel, manufacturing, and construction are more likely to increase their purchases than policyholders working in other industries. The 35–44 age group has a higher likelihood of increasing insurance purchase amount than other age groups. The probability of growing insurance purchase amount with a policyholder located in CA, working in the manufacturing industry, aged between 35 and 44, and with a household of four people is Data Mining for Customer Acquisition, Retention pgrowth 5% exp(1.770 2.292 0.073 0.146 0 .209) 0.63 1 exp(1.770 2.292 0.073 0.146 0 .209) Safe Net can use the formula in Eq. (9.6) to score its existing policyholders and target those with the highest probability of growth with the firm’s new insurance package. With this approach, in order to insure a maximum level of marketing returns, the firm invests its marketing dollars on those customers most likely to grow their insurance purchase. ■ Reference Neter, J., W. Wasserman, and M.H. Kutner. Applied Linear Statistical Models. Irwin, Homewood, IL, 1990. 231 This page intentionally left blank CHAPTER 10 Data Mining for CrossSelling and Bundled Marketing This page intentionally left blank Cross-selling and bundled marketing both refer to marketing additional products to existing customers. This chapter presents two case studies that demonstrate the application of data mining to creating effective cross-selling or bundled marketing strategies. The case studies utilize association analysis (also known as market basket analysis), a data mining technique that can be implemented with software applications such as Association Engine, SAS Enterprise Miner, and IBM Intelligent Miner. Association analysis helps us to understand which products that customers purchase together. The technique addresses the question: if a customer purchases product A, how likely is he to purchase product B? The question is often expressed in terms of a ‘rule,’ expressed as follows: A B As described in Chapter 7, three standard measures are used to assess the significance of a rule: support, confidence, and lift. Confidence is the conditional probability that a customer will purchase product B given that he has purchased product A. Lift measures how many times more likely is a customer who has purchased product A to purchase product B, when compared with those who have not purchased product A. Support is the percentage of total transactions where A and B are purchased together during a specified period of time. ■ Association engine Association Engine, a tool developed by Octanti Associates, Inc. (www. octanti.com) is used to analyze the data. The tool was developed in the C programming language and has an EXCEL interface. There are three sections displayed by the EXCEL interface of Associate Engine. The first section titled ‘Input’ requires the following input fields from the user. ● ● ● Minimum product count: Products needs to meet the minimum count to be included in the analysis. If the minimum product count is 50, then only those products with at least 50 purchase records are included in the analysis. Maximum rule output: The field specifies the number of rules to be displayed in the output section. Input file: This field identifies the location and the name of the input file. The input file needs to be in the text format with three variables (ID, PID, and product). ID, PID, and product refer to customer ID, product ID, and product name, respectively. The user of Associate Engine has the option of showing the analysis results with either product ID or product name. 236 Data Mining and Market Intelligence ● ● ● Rule sorting: The user can specify how the results are sorted. The results can be sorted by support, by lift or by confidence. Rule output: This field gives a user the option of outputting the result with either product ID or product name. Engine directory: This is where a user can specify the location of the Association Engine application. On the lower left corner of the display screen of Association Engine is the data summary section. This section shows the basic statistics of the input data. ● ● ● ● ● ● ● Original number of records: This field indicates the number of transaction records in the input file. Number of records after de-dupping: Association Engine performs an automatic de-dupping of the raw transaction data. If a customer purchases product A three times, there will be three duplicate records of customer product combinations in the raw data. The de-dupping process of Association Engine will remove two out of the three records and create a unique record. Number of records after filtering: This field shows the number of transaction records after Association Engine removes the products that do not meet the minimum product count requirement. Number of customers before filtering: This shows the number of unique customer ID before Association Engine removes the products that do not meet the minimum product count requirement. Number of customers after filtering: This indicates the number of unique customer ID after removing products that do not meet the minimum product count requirement. Number of products before filtering: This field is the number of unique product ID before removing products that do not meet the minimum product count requirement. Number of products after filtering: Number of unique product ID after removing products that do not meet the minimum product count requirement. On the right hand side of the display screen of Associate Engine is the analysis output section where top key rules and three statistics (support, lift, and confidence) are displayed. ■ Case study one: e-commerce cross-sell E-commerce firm Horizon offers an array of consumer products online. The company wants to increase the sale of LCD Flat Panel HDTVs given Data Mining for Cross-Selling and Bundled Marketing the very attractive margins of these products. The marketing executives at Horizon believe that it would be less expensive and more profitable to cross-sell this product to existing customers than it would be to acquire brand new customers. Horizon has 2 years of transactional data, which consists of basic customer (customer ID, names, and addresses) and purchase information (product purchase date, product purchased, units purchased, and purchase dollar amount). A cross-selling analysis is conducted by utilizing a 5% random sample (69,671 records) from the 2-year transactional data. Two specific variables, customer ID and product purchased, are used for the analysis. Table 10-1 illustrates partial raw data used for the analysis. Table 10-1 Raw data subset used for Horizon cross-selling analysis Customer ID Product purchased 10001 10001 10002 10003 10003 10004 10004 10004 10004 HP 710C Inkjet Printer IBM Thinkpad Laptop Toshiba laptop Apollo p1200 Color Inkjet LCD Flat Panel HDTV Camcorder with HD Video LCD Flat Panel HDTV VCR Sony VAIO laptop Model building Around half of the sample data, or 34,929 transaction records, is used for building an association model. After de-dupping and removal of products that do not meet the minimum product count requirement (50 in this case), 27,264 unique transaction records, 18,172 unique customers and 148 products are included in the analysis. Table 10-2 shows the partial results of the analysis. Association Engine outputs the first 20 rules as instructed. Consider the second and the fourth rule as examples. The second rule in the main output shows that those who purchase VCRs often purchase LCD Flat Panel HDTV as well. The three statistics (support, lift, and confidence) for the second rule are 0.17%, 92.1, and 43.4%. The support statistic, 0.17%, is the percentage of customers that purchased both VCRs and LCD Flat Panel HDTVs. The lift statistic, 92.08, shows that customers who purchased VCRs were 92.08 times more likely to purchase LCD Flat Panel HDTVs than randomly selected customers. The confidence statistic is the conditional 237 238 Data Mining and Market Intelligence probability that indicates that 43.37% of the customers who purchased VCRs also purchased LCD Flat Panel HDTVs. The fourth rule in the main output shows that Sony VAIO laptops were often purchased together with LCD Flat Panel HDTVs by the same customers. The support statistic indicates that 0.12% of the customers purchased both products. Those who purchased Sony VAIOs were 89.39 times more likely to have purchase LCD Flat Panel HDTVs. A significant percentage (42.11%) of those customers who purchased SONY VAIOs bought LCD Flat Panel HDTVs. Table 10-2 Partial association analysis results for case study one (model building stage) Model output Rule Support (%) Lift Confidence (%) StarOffice 5.0 Personal Edition for Linux (Intel) → Red Hat Linux 6.0 for Intel Systems 0.53 20.39 49.11 VCR → LCD Flat Panel HDTV 0.17 92.08 43.37 Seagate 6.5 GB EIDE Hard Drives → Hard Drive Cable Pack 0.11 104.36 43.14 Sony VAIO laptop → LCD Flat Panel HDTV 0.12 89.39 42.11 Zobmondo!! Lite boardgame → Zobmondo!! Original boardgame 0.12 88.31 40.32 Sony VAIO laptop → VCR 0.11 96.75 38.60 LCD Flat Panel HDTV → VCR 0.17 92.08 36.73 KB Gear JamCam/Web Page Power Pack Combo → KB Gear JamCam 0.12 16.93 34.67 Canon digital camera → IBM Thinkpad 0.47 15.63 32.23 HP 710C inkjet printer → Toshiba laptop 0.23 22.10 30.38 (Continued) Data Mining for Cross-Selling and Bundled Marketing Table 10-2 (Continued) Model output Rule Support (%) Lift Confidence (%) Apollo P1200 color inkjet → Toshiba laptop 0.09 21.94 30.16 Camcorder with HD video → IBM Thinkpad 0.28 14.24 29.35 Apollo P1200 color inkjet → DISC burner 0.09 54.53 28.57 DVD player → IBM Thinkpad 0.24 13.66 28.16 DVD player → Canon digital camera 0.23 18.67 27.01 VCR → Sony VAIO laptop 0.11 96.75 26.51 Zobmondo!! Original boardgame → Zobmondo!! Lite boardgame 0.12 88.31 26.32 Camcorder with HD video → Canon digital camera 0.25 17.88 25.87 Hard Drive Cable Pack → Seagate 6.5 GB EIDE Hard Drives 0.11 104.36 25.58 Apollo P1200 color inkjet → HP 710C inkjet printer 0.08 33.44 25.40 Model validation The remaining data sample (34,742 transactional records) is used for model validation. After de-dupping and removal of products that do not meet the minimum product count requirement, 27,143 unique transaction records, 18,170 unique customers and 150 products are included in the analysis. Table 10-3 illustrates the first 20 rules created by the Association Engine. The third and the fifth rule in the results are similar to the second and the fourth rule in the model building output previously discussed. The third rule in the main output shows that those who purchased VCRs were likely to purchase LCD Flat Panel HDTVs. The three statistics (support, lift, and confidence) of this particular rule are 0.17%, 91.8, and 39.33%. The fifth rule in the validation results indicates that those who purchased Sony VAIO laptops were likely to have purchased LCD Flat 239 240 Data Mining and Market Intelligence Panel HDTVs. The three statistics (support, lift, and confidence) of the fifth rule are 0.11%, 81.50, and 34.92%, respectively. The validation results are consistent with the model results, which confirms that Horizon should Table 10-3 Validation results for association analysis in case study one Model output Rule Support (%) Lift Confidence (%) StarOffice 5.0 Personal Edition for Linux (Intel) → Red Hat Linux 6.0 for Intel Systems 0.62 19.59 48.86 LCD Flat Panel HDTV → VCR 0.17 91.78 39.33 VCR → LCD Flat Panel HDTV 0.17 91.78 39.33 Seagate 6.5 GB EIDE Hard Drives → Hard Drive Cable Pack 0.11 97.10 38.33 Sony VAIO laptop → LCD Flat Panel HDTV 0.11 81.50 34.92 Sony VAIO laptop → VCR 0.10 77.79 33.33 KB Gear JamCam/Web Page Power Pack Combo → KB Gear JamCam 0.12 16.03 33.33 Canon digital camera → IBM Thinkpad 0.50 15.30 32.49 Zobmondo!! Lite boardgame → Zobmondo!! Original boardgame 0.09 86.31 31.58 DVD player → Canon digital camera 0.26 20.19 30.81 Camcorder with HD video → IBM Thinkpad 0.33 14.30 30.36 DISC burner → Toshiba laptop 0.15 23.20 28.70 Hard Drive Cable Pack → Seagate 6.5 GB EIDE Hard Drives 0.11 97.10 28.05 (Continued) Data Mining for Cross-Selling and Bundled Marketing Table 10-3 (Continued) Model output Rule Support (%) Lift Confidence (%) DVD player → IBM Thinkpad 0.23 12.87 27.33 DVD player → Camcorder with HD video 0.22 24.26 26.16 HP 710C inkjet printer Camcorder with HD video → Toshiba laptop 0.18 20.34 25.17 → Canon digital camera 0.27 16.38 25.00 Red Hat Linux 6.0 for Intel Systems → StarOffice 5.0 Personal Edition for Linux (Intel) 0.62 19.59 24.90 LCD Flat Panel HDTV Hard Drive Cable Pack → Sony VAIO laptop Maxtor 13.6 Gigabyte Hard Drive 0.11 81.50 24.72 0.10 52.77 24.39 → cross-sell LCD Flat Panel HDTVs to two customer segments, those who purchased VCRs and those who purchased Sony VAIO laptops. ■ Case study two: online advertising promotions Online advertising is another area where cross-selling analysis can be applied to create effective targeting strategies. Association analysis can be used to determine what additional advertisement to be offered to web visitors after they respond to a particular advertisement. The following case study illustrates this concept. Financial retailer Netting, Inc. wants to optimize the performance of its online banner advertisement, which presents discount offers on various investment products such as mutual funds, 401 K plans, and retirement plans. Performance of an online banner ad is defined as the number of leads generated by the banner in a given period of time. A lead is a website visitor who clicks on a banner ad and fills out a registration form to download a white paper posted in the ad. 241 242 Data Mining and Market Intelligence Netting has 1 year of historical data, consisting of basic information (visitor ID, name, postal address, and e-mail address) on website visitors who responded to a list of ads. Netting draws a 5% random sample (10,266 records) from the historical data and decided to use two variables, visitor ID and advertisement responded to, in the association analysis. Table 10-4 illustrates data subset used for the analysis. Table 10-4 Data subset used in association analysis in case study two Website visitor ID Banner 11 11 11 13 13 15 15 15 16 Life Insurance Home Equity ETF 401K ETF ETF Home Mortgage Home Equity Home Mortgage Model building Netting utilizes half of the data (5133 lead records) for building an association model. After de-dupping and removal of ads that do not meet the minimum product count requirement of ten, 5066 unique lead records, 2258 unique visitors and 20 ads are included in the analysis. The Association Engine displays the first ten rules as specified by the user. Table 10-5 shows the results corresponding to these ten rules. Take the first and the fifth rule as examples. The first rule indicates that those visitors who responded to mutual funds ads often also responded to ads on exchange traded funds (ETF). The three statistics (support, lift, and confidence) associated with the first rule are 1.33%, 4.18, and 46.88%, respectively. The support statistic indicates that 1.33% of the visitors in the model building data set responded to both mutual funds and ETF ads. The lift value suggests that visitors who responded to mutual funds ads were 4.18 times more likely to respond to ETF ads than randomly selected visitors were. The confidence statistic indicates that 46.8% of the visitors who responded to mutual funds ads also responded to ETF ads. The fifth rule in the analysis results shows that visitors who respond to home mortgage ads are likely to respond to home equity ads as well. The three Data Mining for Cross-Selling and Bundled Marketing Table 10-5 Association analysis partial results for case study two (model building stage) Model output Rule Mutual Funds Stocks ETFs Futures Home Mortgage Home Mortgage Fixed Income Home Mortgage 401 K Home Mortgage → → → → → → → → → → ETFs Home Equity 401 K ETFs Home Equity 401 K Home Equity Life Insurance Home Equity Options Support (%) Lift Confidence (%) 1.33 4.38 1.64 1.55 1.86 1.82 1.99 1.77 7.62 1.73 4.18 1.32 2.16 3.81 1.22 2.02 1.17 2.75 1.14 4.40 46.88 44.59 43.02 42.68 41.18 40.20 39.47 39.22 38.31 38.24 statistics (support, lift, and confidence) associated with the fifth rule are 1.86%, 1.22, and 41.18%, respectively. Model validation Netting uses the remaining 5133 records for model validation. After de-dupping and removal of products that have fewer than 10 transactional records, there remain 4989 unique lead records, 2302 unique visitors, and 20 ads in the analysis. The Association Engine displays the first ten rules. Table 10-6 shows the model validation results. Six out of the ten rules are consistent with the rules generated from the model building process. The six rules are ● ● ● ● ● ● Futures Mutual Funds Home Mortgage Home Mortgage Fixed Income 401 K → → → → → → ETF ETF Home Equity Life Insurance Home Equity Home Equity. Consider the first and the second rule as examples. The first rule (Futures→ETFs) indicates that web visitors who responded to mutual funds ads were likely to respond to ETF ads. For this rule, the support, lift, and confidence statistic are 1.91%, 4.12, and 53.66%. The second rule (Mutual Funds→ETF) suggests that web visitors who responded to mutual funds ads were likely to respond to ETF ads as well. In this case, 243 244 Data Mining and Market Intelligence Table 10-6 Validation results for association analysis in case study two Model output Rule Futures Mutual Funds Home Mortgage Home Mortgage Fixed Income Life Insurance 401 K ETFs Savings Credit Card → → → → → → → → → → ETFs ETFs Home Equity Life Insurance Home Equity Home Equity Home Equity Credit Card Home Equity Home Equity Support (%) Lift Confidence (%) 1.91 1.56 1.87 1.74 2.09 6.34 6.86 1.39 3.34 4.69 4.12 4.00 1.39 2.59 1.20 1.17 1.16 2.90 1.13 1.11 53.66 52.17 45.74 42.55 39.67 38.62 38.35 37.21 37.20 36.61 the support, lift, and confidence are 1.56%, 4, and 52.17%, respectively. Results indicate that if Netting wants to boost the number of leads from ETF ads, it should target those who have already responded to futures or mutual fund ads. To increase the number of its leads from home equity ads, the firm should present its home equity ads to web visitors who have already responded to home mortgage, fixed income, or 401 K advertising. To improve on the number of leads for life insurance, the firm should offer its life insurance ads to those who have responded to home mortgage ads. CHAPTER 11 Web Analytics This page intentionally left blank According to JupiterResearch (www.jupiterresearch.com), the US online advertising market is expected to grow from $9.3 billion in 2004 to reach $19 billion in 2010. This rapid growth is partially driven by the fact that with web analytics tools, data on online advertising performance can be collected, analyzed, and optimized timely and efficiently. The growth in online advertising is also driven by the rapid development and increasing influence of search marketing, a topic that we will discuss in depth in Chapter 12. In this chapter we focus on web analytics and metrics. Web analytics is often implemented separately from marketing research and data mining. Synergy among these three areas should be explored to effectively capture a bird’s eye view of the market and its customers. ■ Web analytics overview Web analytics comprises a variety of techniques for measuring and analyzing web activity. Such techniques can be implemented with a variety of tools, such as those supported by Webtrends Analytics (www.webtrends.com), Omniture SiteCatalyst (www.omniture.com), Coremetrics (www.coremetrics.com), and Google Analytics (www.google. com). The following is a list of common applications of web analytics. ● Measuring web site visitor activities across the five stages of a sales funnel (awareness, interest and relevance, consideration, purchase, loyalty, and referral): Various metrics are used to gauge visitor activities at every stage of the funnel. Examples of the metrics are number of visits, number of clicks, number of page views, navigation paths, time spent on site, number of downloads, and purchases. These visitor activities can be either organic or triggered by marketing events such as lead generation and e-commerce. Web visitor activities that result from a marketing effort can be tracked to measure the performance and impact of the marketing program. In the absence of any marketing effort, web visitor activities can be analyzed to better understand the effectiveness of web site architecture and usability. Optimization of marketing efforts is done by integrating web analytics into future marketing, campaign planning, and execution on the basis of web analytics output. Some common software tools to accomplish this include: ● ● Ad serving platforms, such as DoubleClick’s DART (www. doubleclick.com) Ad content management application 248 Data Mining and Market Intelligence ● ● ● ● Search marketing bid management software Site search system Targeting and segmentation application E-commerce platform. As previously mentioned, web analytics has made a significant contribution to the increasing importance of online advertising and marketing. Timely measurement of online marketing has been greatly facilitated by web analytics tools. However, given the multiplicity of operational metrics that can be collected, the volume of data generated by typical web analytical tools can sometimes be overwhelming. To mitigate this daunting task, it is important to identify key operational metrics that are highly correlated with the success metrics (or key performance indicator). It is not necessary to track all metrics. Rather, the focus should be on tracking and optimizing the key operational metrics to maximize marketing success. Although web analytics tools are effective in collecting web data and generating reports and analysis, human intervention is essential to correctly interpret results and derive actionable recommendations. ■ Web analytic reporting overview Standard web analytics tools provide reporting capabilities for a variety of web site visitor activities. These reports show information snapshots, and display trends over a period of time. It is important to select truly relevant reports based on the objectives of the marketing plan. The following sections discuss the common objectives of online marketing and the role of web analytics in serving these objectives. Brand or product awareness generation The visitor ’s awareness of a particular brand or product can be measured directly through online surveys supported by numerous software tools. Most of these tools can be used to create questionnaires, administer and collect survey data, and perform analysis. Zoomerang (www.zoomerang. com) is an example of such an online survey tool. Online survey and analysis services are also provided by survey companies, such as SurveySite (www.surveysite.com). Survey studies, if conducted on a regular basis, can directly measure changes in web site visitor awareness levels. In the absence of awareness surveys, proxy metrics may be used to estimate changes in visitor awareness. These proxy metrics track web traffic volume in one form or another, and are utilized under the assumption that the more frequently visitors are exposed to web marketing, the more their awareness increases as a result. Higher traffic volume to the web site Web Analytics indicates a higher exposure to product or brand information. Common proxy metrics are: ● ● ● ● ● ● ● Number of visits Number of unique visitors Number of new visitors Number of returning (repeat) visitors Number of page views Average page views per visitor Average time spent on the site. Consider the following operational metrics that may influence changes in visitor awareness. These metrics are mainly content-related. ● ● ● ● ● Entry page Referral URL Exit page Most requested pages Search keyword. Some of these metrics, such as search keywords, addresses the question of why a visitor visits the site. As a result, better-targeted ads and messages can be created with clear understanding of the reason why visitors visit a web site. Some web analytics tools offer visitor segmentation capability for analyzing data on visitor behavior, reasons for visiting a web site, and visitor demographics. With the increasing popularity of social networking communities such as blogs, some web analytics tools now offer additional analytic capabilities to analyze data collected about community members. For instance, Omniture’s SiteCatalyst 13 (www.omniture.com) provides reporting and analysis capabilities for measuring consumption and influence of social networking and blog content. Web site content management Online publishers such as CNET.com and media site owners such as Yahoo.com generate revenues partially through reader subscriptions or sales of online real estate (web site space) to advertisers. The following is a list of key factors that online publishers or media site owners may consider for pricing. ● ● Number of impressions: An impression is an incidence where a web site visitor views particular web content. Net reach: This statistic measures the percentage of the target audience reached by a particular advertisement in each exposure. 249 250 Data Mining and Market Intelligence ● ● ● ● ● ● ● Frequency: Frequency refers to how many times a particular advertisement is exposed to the target audience over the marketing horizon. Gross rating points (GRP): This statistic measures the percentage of the target audience reached by a particular advertisement. When an advertisement is exposed to its target audience only once, GRP equals net reach. When there are multiple exposures, GRP equals the product of net reach and frequency. Number of clicks on an ad over a period of time. Click-through rate (also known as CTR) on an ad, defined as number of clicks divided by number of visitors over a period of time. Total number of page views over a given period of time. Average number of page views per individual over a given period of time. Average time spent in viewing an ad per individual over a given period of time. Just as visitor traffic is essential to measure visitor awareness, its significance in driving online publishing and media advertising revenues cannot be overemphasized. The reason is that data on visits can provide insight on what categories of topics and content visitors are most interested in. Visitor segmentation can be applied to the design of targeted ads. Publishers and media site owners can impose a higher CPM (cost per thousand impressions) for site sections with more web traffic and more targeted audiences. Content can be managed more effectively by analysis of metrics such as: ● ● ● ● ● ● ● ● ● Entry page Referral URL Exit page Most requested pages Most viewed topics, categories, or subcategories Most e-mailed or forwarded articles Article ratings Downloads Search keyword. Lead generation A lead is defined as an individual or a business that has a need for a product or service, and is serious about making a purchase. The purpose of a lead generation effort is to uncover such individuals or businesses and to convert them into buyers. Number of leads, lead quality, and lead conversion rate are metrics commonly used for measuring the performance of a lead generation effort. The definition of a lead varies from company to company, and Web Analytics most companies assign different grades to leads. A ‘grade A’ lead, for example, usually has a shorter purchase time frame, a higher purchase budget, or a better credit rating than a ‘grade B’ lead. The following operational web metrics can provide insight on what may cause changes in lead volume and lead quality. Some of the metrics offer information on the types of products or services that a visitor looks for, and can be leveraged to enhance the quantity and quality of leads. ● ● ● ● ● ● ● ● Entry page and referral URL: These two metrics measure sources of leads and how they impact lead quantity and quality. Sources such as search engines, online or print ads, partners, and affiliates that bring in more qualified leads are the sources that need to be emphasized through adequate investment. Exit page: This metric provides information on where visitors exit a web site. For qualified leads, their exit page is usually the lead form. Pages viewed prior to the lead form page are likely to have provided the visitor with the most appropriate information. For those who abandon the web site without becoming leads, their exit pages can provide insight as to why they drop off the process and what can be improved in terms of web page content. Most popular (most frequently traveled) navigation paths (also known as click streams): Analysis of navigation paths can provide insight on the visitor ’s thought process and lead to construction of a visitor profile (also know as visitor persona). A visitor who visits a travel web site and looks for a family vacation package in the summer is likely to have kids in school, and therefore is likely to look for destinations that suit both adults and children during summer breaks. Most requested pages: These pages provide contents that are in highest demand. Keywords searched by qualified leads indicate the types of products and services that these leads look for. Keywords that bring in high volume of qualified leads should be emphasized with more investments in bidding for and purchase of these keywords. Geography: Most web analytics tools provide geographic information on where visitors are physically located. Insight from geography allows for the design of more localized and better-targeted content. Lead demographic and attitudinal data: Demographic and attitudinal data collected through online registrations or lead forms provide additional data for lead profiling and segmentation analysis. Lead to click ratio: This ratio is an indication of click quality. Clicks that do not convert to leads come from individuals who browse the web site without serious intent of making a purchase. Some web analytics tools are capable of tagging individual fields in a lead form. By tracking where visitors drop off as they fill out a lead form, 251 252 Data Mining and Market Intelligence lead form designers can optimize the forms to minimize abandonment rate. It may become a challenge to measure lead conversion if leads are passed onto a sales force for offline followup and purchases. Some web analytics tool vendors have developed an automatic tracking process to tie online leads with offline purchases. One example of this type of automation process is the collaboration between WebSideStory and salesforce.com. E-commerce direct sales Web analytics are excellent tools for tracking direct online purchases. By monitoring how visitors complete or drop off from the purchase process, web sites can be optimized to increase online revenues. Most of the key metrics tracking e-commerce performance can be tracked directly or derived by web analytics tools. Here are just a few of these metrics: ● ● ● ● ● ● ● ● ● Number of transactions Number of buyers Number of transactions per buyer Total sales Average sales per transaction Average sales per buyer Total profit margin Average profit margin per buyer Average profit margin per transaction. Among operational metrics that can provide insight on what may drive online e-commerce revenues are: ● ● Entry page and referral URL: These two metrics track the sources of buyers, and how these sources impact revenues. Sources that bring in more buyers are the sources that need to be supported with additional investment. Consider an example where the majority of online buyers of an e-commerce web site have visited a partner web site before entering into the e-commerce site. The e-commerce web site owner may consider increasing his investment in this particular partner site or similar web sites. Exit page: As potential buyers arrive at the home page of an ecommerce site, they usually start with the main page of a product category and then venture into subcategory pages, place products in shopping carts, and if they proceed with the purchase they check out. Visitor abandonment rate can be minimized by tracking where potential buyers drop off in the navigation process and by focusing on areas of the web site that require improvement. Web Analytics ● ● Exit field: Some web analytics tools are capable of tracking visitor behavior as the visitor navigates and fills out online order forms. Some potential buyers drop out of the purchase process because they are asked to fill out too much information on the order form. Web analytics tools enables tailoring of online order forms to increase the completion rate. Search keyword: By quantifying marketing returns by keyword, management of keyword price bidding can be made more cost effectively. For example, a keyword that brings in 1000 clicks, 100 buyers, $5000 in revenue, and $1500 in profits, results in a profit of $1.50 per click. This implies that the bidding price, measured in cost per click (CPC), needs to be kept below $1.50 to make the investment cost-effective. Customer support and service Increasingly, companies are incorporating customer support and service functions into their web sites to reduce customer service cost and to better serve their customers. According to industry analysts, the cost per live service phone call is ten times the cost per web self-service (Customer Service Group 2005). Cost per case closed is an example of the key metrics that measure the performance of online customer service. Additional metrics include: ● ● ● Number of customers served in a given period of time Number of customer issue resolved or case closed in a given period of time Improvement in customer satisfaction. Web syndicated research Just as syndicated research is widely available in the offline world, similar research is also available for measuring online visitor behavior. Nielsen// NetRatings (www.nielsen-netratings.com), ComScore (www.comscore. com), Hitwise (www.hitwise.com), Dynamic Logic (www.dynamiclogic. com), and other firms offer both syndicated research and customized research on online behavior, industry analysis, competitive intelligence, or market forecast. Syndicated research companies collect, report, and analyze a large number and a broad range of web sites to compile information about the online market. Online syndicated research can be classified into five categories wherein each category data may be dissected by type of web sites, industries, and individual web sites. ● Internet audience measurement: Syndicated research focusing on this category measures standard web traffic and visitor metrics. Nielsen 253 254 Data Mining and Market Intelligence ● ● ● ● NetRatings’ NetView (www.nielsen-netratings.com) and comScore’s Media Metrix (www.comscore.com) offer this type of reporting and analysis through their subscription services. Intelligence on online advertising sales networks: This type of research focuses on tracking web traffic on advertising sales networks. Advertising.com (www.advertising.com) is an example of such advertising sales networks. comScore (www.comscore.com) provides this type of specialized syndicated research through its Media Metrix service. Profiling and targeting of online audience: Most major syndicated research providers offer target audience segmentation to make online advertising more effective. Take Nielsen//NetRatings’ MegaPanel as an example (www.nielsen-netratings.com). This product combines Internet behavior with attitudinal, lifestyle, and product usage data from its panel members to infer the highest level of insight on visitor behavior. Industry and competitive intelligence: This kind of research enables comparison of the performance of one site against the performance of competing sites, and provides industry benchmark data. Hitwise’s Industry Statistics (www.hitwise.com) offers both Internet behavior and demographic profiles for over 160 industries. Performance of online advertising campaigns: Dynamic Logic’s MarketNorms reports (www.dynamiclogic.com) are based on data collected over a large number of online campaigns. ■ References Web Self-service improves support, cuts costs. Customer Service Group Newsletter, & New York (2005). CHAPTER 12 Search Marketing Analytics This page intentionally left blank Search marketing, a specialized area of Internet marketing, has become a key marketing vehicle over the past years due to its cost effectiveness in generating leads and revenue. Appropriate web analytics tools need to be deployed to track the performance of a search marketing program. Analytic output from such tools is then used to optimize future search strategies. Without proper tracking and analysis, a search marketing program is likely to be a costly and wasteful effort. In Chapter 3, we briefly introduced search marketing as a marketing communication channel. This chapter provides a more in-depth discussion specifically on three types of search, organic search, paid search, and onsite search. The chapter also directs the reader to numerous websites offering free search marketing analytic tools and tips. ■ Search engine optimization overview The interface of a search engine is a web page where a user can enter a keyword to observe relevant results. The results are displayed as listings of website links on a search engine result page (SERP). There are two types of listings, free listings and paid listings (Figure 12-1). Free listings Figure 12-1 Listings and ad copies in a result page. 258 Data Mining and Market Intelligence are listings of website links that are displayed free of charge to website owners. Paid listings are listings of website links shown at a cost to website owners. Paid listings sometimes are referred to as sponsored listings and the web links they contain are sponsored links. An organic search, or a natural search, is a process by which a user enters a keyword and find relevant free listings. A paid search is a process where a user enters a keyword and find relevant sponsored listings. Search engine optimization (SEO) is a discipline that has been developed to help website owners optimize their websites in a manner that their website links appear in free listings visibly displayed in SERPs. An ad copy is a brief description of the content of a website. Very often each website link on a result page is immediately followed by an ad copy as illustrated in Figure 12-1. Search engines such as Google and Yahoo use proprietary algorithms that apply different weights to a spectrum of factors in order to evaluate the relevance of a website to a particular search keyword. These factors include frequency of keywords appearing on a website (keyword density) and link popularity, determined by number of inbound links (backlinks) to a website. Forward links (outbound links) are the opposite of backlinks. Forward links refer to links directing visitors out of a website. Google uses a proprietary algorithm named PageRank (Page, Brin, Motwani, and Winograd 1998) that generates a numeric value to measure the importance of a site. The algorithm by now has over 100 variables (Davis 2006). The higher the PageRank output value of a website is, the more likely the website is to appear in a prominent (or high ranking) position in Google search result pages. Search engines index billions of websites and display those that are most relevant in response to a keyword search. To create keyword search results, search engines run automatic programs called robots (also known as spiders or crawlers) on their indices to identify contents most relevant to keywords. SEO requires specialized technical knowledge and experience in creating content relevant to keywords. Although search engine algorithms are proprietary and well-guarded trade secrets, there are general rules disseminated by search engine firms and search marketing experts on how to make website content more relevant to search keywords. Optimization following these generally accepted rules without compromising the quality of the website message and content is referred to as white hat optimization. Black hat optimization is the opposite of white hat optimization. Black hat optimization exploits potential weaknesses of the generally accepted rules to achieve high rankings. Tactics such as creating numerous irrelevant backlinks to a website is black hat optimization. Website owners that practice black hat optimization run the risk of having their websites or pages de-listed by search engines. Given these facts, caution needs to be exercised when practicing SEO. Search Marketing Analytics Website owners can submit their websites’ URLs or web pages directly to search engines submission pages such as the following (www. searchengine.com). ● ● ● http:www.google.com/addurl.html submit.search.yahoo.com/free/request search.msn.com/docs/submit.aspx An alternative way for a website to get into search engines’ listings is for a website owner to submit a website to a directory (Davis 2006). A directory organizes websites by placing them in different categories and subcategories. One of the most important directories is the Open Directory Project, or ODP (http:www.dmoz.com). To implement SEO, an organization can choose to invest in developing in-house expertise or outsource the optimization task to an outside consultant or consulting firm. In either scenario, the first step in the SEO process is to identify the website objectives. The objectives of a website may be to increase visitor traffic or to generate e-commerce sales. The second step in SEO is to conduct a site analysis to fully understand the strengths and weaknesses of the website. The final step in SEO is to identify the gap between the website’s current capability and its objectives. There are numerous online tools that allow for free site analysis for SEO. The following are some examples. ● ● ● ● ● ● ● ● ● SEO Chat (http://www.seochat.com/seo-tools) offers many free SEO tools for quick diagnoses of websites and keywords. SEO Today (http://www.seotoday.com) is a website that offers a lot of SEO related information and resources. Search Engine Marketing Professional Organization, or SEMPO (http:// www.sempo.org) SEO-PR (http://www.seo-pr.com) Search Engine Watch (http://www.searchenginewatch.com) SES/Search Engine Strategy (http://www.searchenginestrategies.com) MarketingSherpa (http://www.marketingsherpa.com) dwoz.com (http://www.dwoz.com/default.asp?Pr=123) webuildpages.com (http://www.webuildpages.com/tools/default.htm) Site analysis The objective of site analysis is to measure the website features and properties that are of importance to search engines, such as keyword density and link popularity. Keyword (search term) tracking and research tools provide information on keyword search frequency, keyword suggestions and keyword density. Keyword search frequency is the number of searches for a particular keyword for a given period of time. Some online 259 260 Data Mining and Market Intelligence site analysis tools generate keyword suggestions based on factors such as keyword popularity. The following links, grouped by topic, contain information useful to website owners looking to optimize the contents of their sites. ● ● ● ● ● ● Keyword popularity and frequency – http://www.wordtracker.com Keyword suggestions – http://www.adwords.google.com/select/KeywordToolExternal – http://tools.seobook.com/keyword-tools/seobook Graphically display keyword density on websites – http://www.seochat.com/seo-tools/keyword-cloud Keyword density on a web page – http://www.seochat.com/seo-tools/keyword-density Keywords trends and top keywords – http://www.about.ask.com/en/docs/iq/iq.shtml – http://www.dogpile.com/info.dogpl/searchspy – http://www.google.com/press/zeitgeist.html – http://www.50.lycos.com – buzz.yahoo.com – http://www.google.com/trends Competition finder for measuring the number of pages containing selected keywords – http:www.tools.seobook.com/competition-finder/index.php Domain analysis tools provide information about a particular domain name such as its availability, its host name, and its administrative contact. The following are some useful online domain analysis tools. ● ● ● Tracking availability and status of a particular domain name – http://www.whois.com Tracking property of a particular domain name – http://www.DNSstuff.com Tracking code to text ratio: The code to text ratio is the percentage of text in a web page. The higher the code to text ratio is, the more likely the page is to acquire higher rankings with search engines listings. – http://www.seochat.com/seo-tools/code-to-text-ratio Link popularity analysis tools measure the popularity of a website in terms of the number of links pointing to it (backlinks). The more backlinks a website has, the more popular it is and the more likely it is to achieve higher rankings with key search engine listings. Websites such as those listed below provide information on link research and tools. ● Evaluating website popularity in terms of number of backlinks with key search engines – http://www.seochat.com/seo-tools/link-popularity Search Marketing Analytics – ● ● ● ● ● http://www.seochat.com/seo-tools/multiple-datacenter-linkpopularity Reporting backlinks of a website by search engine – http://www.thelinkpop.com Reporting forward links or links inside a site (internal links) – http://www.seochat.com/seo-tools/site-link-analyzer Tracking Google PageRank of a website. The higher the PageRank of a website is, the more popular a website – http://www.seochat.com/seo-tools/pagerank-lookup – http://www.seochat.com/seo-tools/pagerank-search Checking broken redirect links of a URL – www.seochat.com/seo-tools/redirect-check URL rewriting (conversion of a dynamic URL to a static URL): Static URLs are more likely than dynamic URLs to acquire higher rankings in search engine results. The content of a static URL changes only if its HTML code is modified. The content of a dynamic URL changes when there is a change in the queries that generates the content of the web page. The following website offers a free tool for converting a dynamic URL to a static URL – http://www.seochat.com/seo-tools/url-rewriting A meta tag is an element in the head section of a web page HTML code that describes the content of the page. Meta tags are invisible to web page viewers but visible to search engines. Although search engines used to rely heavily on meta tags for determining the themes of web pages, the importance of meta tags has decreased lately (Davis 2006). However, it is still a good practice to make sure that meta tags accurately reflect the contents of their corresponding web pages. Free online tools such as those posted on the following websites are available for analysis of meta tags. ● ● Displaying meta tags of a web page – http://www.seochat.com/seo-tools/meta-analyzer Automatic meta tag generator – tools.seobook.com/meta-medic – http://www.seochat.com/seo-tools/meta-tag-generator – http://www.seochat.com/seo-tools/advanced-meta-tag-generator A web page title tag describes the name of a web page and resides in the head section of a web page HTML code. The name of a web page, or a web page title, appears at the very top of a web page when a web browser is activated. When a user bookmarks a web page, the web page title appears as the bookmark name. Very often a web page title, instead of the actual website URL, is listed in SERPs. Web page title tags help search engines discern the content of the corresponding web page. 261 262 Data Mining and Market Intelligence Alt text, which stands for alternate text, is text that is displayed when an image it describes cannot be shown due to technical reasons. Like meta tags, alt text is usually not visible to viewers but is visible to search engine crawlers. Search engines utilize alt text to extract the theme of a web image. Embedding keywords in alt text is a frequently used strategy for improving website PageRanks. SEO metrics SEO return metrics are determined by the business objectives of a website. In Chapter 3, we briefly discussed how return metrics vary from stage to stage of a sales funnel. The same principle applies to SEO metrics. ● ● ● ● ● In the awareness stage, the main objective of SEO is to insure a website acquires high listing rankings with search engines in order to gain as many impressions as possible. Some search engines can provide impression data if requested. The number of impressions, an approximation to the number of search engine users who enter a particular keyword and observe a website link in a SERP, is a common return metric at this stage. In the interest and relevancy stage, search engine users who view and click on a website listing become visitors to the website. In this stage, the key return metrics are the number of clicks, the number of visitors to a website, and the number of responders to marketing offers such as joining a membership program on a website. In the consideration stage, visitors become leads by responding to marketing offers such as filling out a survey form. The number of leads generated by a marketing offer is a common return metric in this stage. In the purchase stage, website visitors who purchase products online are buyers. Widely adopted return metrics in this stage are the number of buyers, the number of transactions, and the amount of transactions in a period of time. In the loyalty and referral stage, frequently used return metrics are the number of repeat purchases, the number of referrals, and customer satisfaction scores defined and measured by individual website owners. A variety of operational metrics can be tracked and improved to enhance the performance of a website and impact the success of a marketing effort. Under the context of SEO, operational metrics are often measures of website visitor quantity or quality. The following are some relevant operational metrics. ● Keyword position (ranking): Keyword position (ranking) is an operational metric for measuring the visibility of a website. High visibility drives high volume of visitors to a website. Search Marketing Analytics ● Number of backlinks: This metric influences the rankings of a website with search engines and consequently impacts the volume of visitors to a website. The quality of website content and the effectiveness of ad copies are important qualitative factors in the success of a SEO effort. Website quality has an impact on the quantity and quality of visitors, attracting repeat and high-quality users. Ad copies, on the other hand, influence the user ’s decision to click on a particular link in a listing. ■ Search engine marketing overview Search engine marketing (SEM) refers to maximizing returns on investment in paid search keywords. According to comScore (http://www.comScore.com), the US search market in April 2006 was dominated by Google (43.1%), Yahoo (28.0%), MSN (12.9%), Time Warner/AOL (6.9%), Ask (5.8%), and myspace.com (0.6%). The following are some of the major paid search programs currently in the United States ● ● ● ● Google AdWords (adwords.google.com/select/Login) Yahoo Search Marketing (searchmarketing.yahoo.com) Microsoft adCenter (msftadcenter.com) Ask (sponsoredlistings.ask.com) To engage in paid search marketing, a website owner needs to enroll in the paid search program of a search engine firm and agree to pay a fee for each click on his website link. This fee is called Pay Per Click (PPC) or Cost Per Click (CPC). PPC is a function of keyword popularity and keyword position (ranking). The more popular a keyword is or the more visibly a website link appears on keyword search result pages, the higher a keyword PPC is. PPC is subjected to bidding and as a result changes frequently. Special software has been developed to automate management of keyword bidding by paid search program participants. Automatic bid management systems allow for management and tracking of keyword bidding by paid search program participants across multiple search engine platforms. Vendors such as Google, Yahoo, and Omniture provide bid management systems. SEM resources Keyword positions are important since listings with top positions (or high rankings) are more visible to users and therefore attract more impressions 263 264 Data Mining and Market Intelligence (visitor eyeballs). The following websites offer SEO keyword evaluation tools that can also be applied to SEM. ● ● ● ● ● ● ● ● ● dwoz.com (http://www.dwoz.com/default.asp?Pr=123) http://www.webuildpages.com Measuring keyword popularity and frequency – http://www.wordtracker.com Keyword suggestions based on search frequency – adwords.google.com/select/KeywordToolExternal – tools.seobook.com/general/keyword Graphically display keyword density on web-sites – http://www.seochat.com/seo-tools/keyword-cloud Measuring keyword density on a web page – http://www.seochat.com/seo-tools/keyword-density Keyword position tools for top keywords or keywords of interest – about.ask.com/en/docs/iq/iq.shtml – http://www.dogpile.com/info.dogpl/searchspy – http://www.google.com/press/zeitgeist.html – 50.lycos.com – buzz.yahoo.com Keyword popularity – google.com/trends Competition finder measuring number of pages containing chosen keywords – tools.seobook.com/competition-finder/index.php Websites offering general information on SEM include ● ● ● ● Search Engine Watch (http://www.searchenginewatch.com) Search Engine Strategy or SES (http://www.searchenginestrategies.com) SEMPO: Search Engine Marketing Professional Organization(http:// www.sempo.org/home) MarketingSherpa (http://www.marketingsherpa.com) SEM metrics Unlike SEO where most costs are fixed, SEM costs are mainly variable costs originating in keyword purchases and management. PPC is an example of SEM metrics. Analysis of metrics at the keyword level, rather than analysis based on aggregate data across keywords, is crucial for measuring the cost efficiency of paid search marketing. SEM is a discipline that requires specialized technical knowledge and experience in keyword bidding and Search Marketing Analytics management. Without proper keyword bidding and management strategies, SEM can become costly and ineffective. Like SEO, SEM return metrics differ according to the business objectives and by where the audience is in the sales funnel. The same success metrics for SEO can be applied to SEM. Most paid search programs such as Google AdWords provide campaign management reports that allow paid search program participants to track operational metrics such as keyword position and clickthrough rate, where the latter is defined as the ratio of the number of clicks and the number of impressions. ■ Onsite search overview In contrast to SEO and SEM that focus on search for information residing across multiple websites over the Internet, onsite search is search for information contained within one website. Onsite search has not gained as much traction as SEO or SEM (Inan 2006), a reality that is quite unfortunate given the importance of onsite search. More resources appear to have been devoted to SEO and SEM than to onsite search. As a result, onsite search is rarely effective. It is ironic that significant investments are made in SEO or SEM to attract as many visitors as possible to a website but when they arrive, they may find the site navigation and onsite search frustrating and as a result they exit the website as quickly as possible. Onsite search is particularly important to informational sites such as CNN.com and for e-commerce sites such as Amazon.com and eBay.com. To infer the degree of importance of onsite search to a particular website, we can track metrics such as the percentage of visitors who utilize onsite search or who make decisions on the basis of onsite search. Research shows that the percent of utilization of onsite search (utilization rate of onsite search) ranges from 5 to 40% of the number of visitors (Inan 2006). A low utilization rate of onsite search may indicate either that onsite search is not essential to website visitors because the website content is well organized, or that onsite search is not easy to use due to various factors. If needed, website owners may consider conducting a visitor survey to understand the reason for low utilization in onsite search. Google (http://www.google.com), Atomz (http://www.atomz.com), and Mondosoft (http://www.mondosoft.com) are among the application service providers (ASP) offering onsite search services. Visitor segmentation and visit scenario analysis To effectively assist website visitors with their onsite navigation experience, website owners need to understand what audience segments visit 265 266 Data Mining and Market Intelligence their websites and what their profiles are. Based on visitor segmentation and profile information, site owners can successfully serve their visitors by generating relevant content and effectively organizing its message. Analysis of onsite search keywords can generate insight on what content might be missing on a website or might be hard to find by website visitors. In other words, an onsite search function can be either an enabling tool for website visitors to find what they need or a research tool for a website owner to understand their website visitors’ needs. It is challenging to meet every single visitor ’s needs, so site owners need to focus on their core audiences. As a result, many website owners pre-segment their visitors to ensure that their core audiences are well served by their sites. Most site owners appear to be aware of the need for segmentation and have incorporated basic segmentation into their web designs. The four common segmentation approaches are by demographics, products, verticals (industries), and geography. Many websites are structured based on one or more of these four approaches. Let’s consider a few high-tech company websites and their visitor segmentation schemes. The networking giant Cisco has a very user-friendly website (http://www.cisco.com/). The four basic visitor segments are small and medium businesses, large enterprises, service providers, and homes and home offices. The site also has a products and services section to help visitors locate their products and services of interest. The mobile phone company Motorola has five visitor segments classified on its animated website (http://www.motorola.com): consumers, businesses, governments, service providers, and developers. The company’s website also has a products and services section with detailed information about the company’s key products and services by geography. The website of Hewlett Packard (http://www.hp.com) displays five visitor sections catering to its five target customer segments: homes and home offices, small and medium businesses, large enterprise businesses, governments, health and education, and graphic art. These customer segments are fairly common in the high-tech industries due to the fact that technology solutions need to be customized to address the infrastructure and needs of businesses and consumers. Financial firms often organize their website contents by products and reasons for visits. The reasons for visit are mainly aligned with life stage events. The Bank of America website (http://www.boa.com) exhibits segmentations by demographics, products and services, and reasons for visit. Three basic demographic segments are visible on the BOA site: personal (consumers), small businesses, and corporations and institutions. The website also contains information about the bank’s core products and services. In addition, the website consists of contents organized by life stage events such as buying a home, purchasing a car, and planning for college. Information on life events often resonates well with consumer visitors Search Marketing Analytics as it speaks to their personal life experiences that are tied with financial needs. In addition, the site offers a visible section for existing customers to manage their accounts. Let’s now consider a website example in the manufacturing industry. The Toyota (http://www.toyota.com) website lists Toyota’s various car types (cars, trucks, SUVs/vans, and hybrids) as well as links to sections organized by reasons for visiting the site, under a header of ‘Shopping Tools.’ ● ● ● ● ● ● ● ● Build and Price Your Toyotas Comparisons by Edmunds Find Local Specials Explore Financial Tools Request a Quote Locate a Dealer Request a Brochure Certified Used Vehicles. Creating distinct sections in a website to cater to various visitor segments may enhance but does not guarantee satisfactory visitor experience. Consider the following scenario. A visitor visits a high-tech website and searches for an answer to a question regarding the expansion of his software license agreement. This visitor is a small business owner in the retail industry residing in the US. He can easily drill down to the specific demographics (small business), geography (US), product (software), and vertical (retail) sections of the website but may still get overwhelmed by the amount of information made available by the search. The visitor may need to resort to onsite search to find the answer to his question. In this instance, an onsite search function is ideal in guiding the visitor to the information he needs. Success metrics of onsite search are determined by marketing objectives and are similar to those of SEO and SEM. Keyword-level analysis is crucial for aligning an onsite search function with website content. It is a misconception that the more information there is on a site, the better the visitor experience is. Frequently, sites are cluttered with information, some of which is completely irrelevant to visitors. Through manual web log file analysis or implementation of advanced web analytics, website owners can extract the most popular keywords measured by search frequency and use this information as the basis for enhancing or eliminating particular components of the website contents. In addition to keyword search frequency, here are other metrics can be used to optimize onsite search. ● Number of searches per visitor per session: This metric measures the number of searches a visitor conducts while visiting a particular website. 267 268 Data Mining and Market Intelligence ● ● ● Number of attempts per search: This metric measures how many times a visitor enters similar phrases or keywords in his attempts to find relevant content. Number of attempts before clicking on search results: This metric measures the number of attempts a visitor makes before clicking on any of the links in the search results. A lower number indicates higher site efficiency. Customer satisfaction measured in a quantitative scale such as a scale of five with five being extremely satisfied and one being extremely dissatisfied. ■ References Davis, H. Google Advertising Tools. O’Reilly Media, Inc., 2006. George, D. The ABC of SEO, Search Engine Optimization Strategies. Lulu Press, 2005. Inan, H. Search Analytics: A Guide to Analyzing and Optimizing Website Search Engines. Hurol Inan, 2006. Page, L., S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking, Bringing Order to the Web. Stanford Digital Library Technology Projects. Stanford University, 1998. Index This page intentionally left blank A Acceptance area, of probability density curve, 130 Active listening, 12 Ad copy, 258 Addressable market, 76 Advertising determining financial returns of, 42 impact on generating marketing returns, 28 online. See Online advertising Advertorial, 52 AdWords, of Google, 265 AID. See Automatic Interaction Detection (AID) Akaike’s Information Criterion (AIC) statistic, 196 Alternative hypotheses, 130 Alt text, 262 American Express credit card. See Credit cards Analysis of dependence, 145 Analysis of interdependence, 145 Analysis of Variance (ANOVA), 172–175 ANOVA. See Analysis of Variance (ANOVA) Applet, 52 Applications, of data mining for cross-selling, 140 for customer acquisition, 140 for customer attrition, 140 for customer profiling, 140 for customer segmentation, 139 for fraud detection, 140–141 for inventory optimization, 140 for marketing program performance forecasts, 140 for personalized messages, 140 Applied Correspondent Analysis, 164, 168 Arbitron, 108 ARIMA, in time series analysis, 183, 184 Arithmetic mean, 120 Association analysis, 190 defined, 235 for e-commerce, 237–241 for online advertising, 241–244 Association Engine. See also Association analysis analysis output section, 235–236 data summary section, 236 EXCEL interface of, 235–236 input section of, 235–236 Attrition model, for customer retention, 227–229 Audience segmentation. See Customer segmentation Autocorrelation function (ACF), in time series analysis, 182–183, 185 Automatic Interaction Detection (AID), 157 Autoregressive (AR) model, in time series analysis, 180–181 Auto-regressive moving average (ARMA) models, 23 B Bank of America, 267 credit card. See Credit cards Banners. See Online banner advertisement Behavior and demographics segmentation cluster analysis for, 206 model building, 196–201 model validation, 201–205 Binary variable, 117 Binomial distribution, 123 Black hat optimization, 258 Blogs, 60 Brand awareness online survey for, 248 proxy metrics for, 248–249 Brand equity definition of, 8 value, 57 Brand recognition, 94–95 Broadcast channels, 51 Budget allocation, fractional, 31 Bundled marketing, 235. See also Cross-selling strategies Bureau of Economic Analysis, 77 272 Index Business objectives identification of, 45–46, 102 in research report, 112 Business reply, card and envelop, 55, 106 C Call centers, phone interview in, 107 Canonical correlation analysis, 175–176 CART. See Classification and Regression Tree (CART) Case studies on behavior demographics segmentation. See Behavior demographics segmentation on cross-sell. See Cross-selling strategies on customer acquisition, 221–227 on customer growth model, 229–231 on customer retention, 227–229 on customer satisfaction segmentation. See Customer satisfaction segmentation on response sgmentation, 208–210 on value segmentation, 205–208 CATI. See Computer assisted telephone interviewing (CATI) Central limit theorem, 118 CHAID. See Chi-Square Automatic Interaction Detection (CHAID) Channel partners, 92 Chi-Square Automatic Interaction Detection (CHAID), 158–160, 168, 195 for response segmentation model building, 209–210 model validation, 210–211 Chi-square ( 2) distribution, 124–125 Classification and Regression Tree (CART), 70, 160–162, 195 for customer segmentation, 90 for value segmentation model building, 207–208 model validation, 208 Click-through rate (CTR), 67, 250 Cluster analysis, 195 for behavior and demographics segmentation, 206 model building, 196–201 model validation, 201–205 description, 151 hierarchical methods in agglomerative, 152–157 divisive methods, 157–162 partitioning methods, 162–163 similarity measures, 152 Collaborative filtering, 190–192 Communication channels, 49–51 determination of, 14 multiple, 30 reporting campaign performance on, 65–67 types of, 12–13 Competitive intelligence, analysis of, 8 Compound annual growth rate (CAGR), 80 Computer assisted telephone interviewing (CATI), 107 comScore report, 75–76 Conjoint analysis, 186, 187–188 Constant elasticity model, 24–25 Contingency tables, 128 Continuous variable, 117 Control group, defined, 134 Corporate finance, and marketing models, 35–37 Corporate investors, 36 Corporate profits, 79 Corporate websites, 54–55 Correlation coefficient, between two variables, 126 Correspondence analysis, 164, 168–172 Covariance, 126 Cramer’s coefficient, 129 Credit cards, 80 Cross-selling, 140 Cross-selling strategies defined, 235 in e-commerce. See E-commerce, cross-selling strategies in Index at Horizon association model for, 237–239 model validation, 239–241 at Netting Inc., 241–242 association model for, 242–243 validation results, 243–244 in online advertising. See Online advertising Customer acquisition, 140 direct mailing for. See Direct mailing, for customer acquisition Customer attrition, 140 Customer growth model, 229–231 Customer profiling, 140 Customer retention attrition model for, 227–229 Customer satisfaction, metrics. See Proxy metrics Customer satisfaction segmentation attributes in, 212 discriminant analysis for model building, 212–213 validation, 213–217 Customer segmentation, 71, 139 behavior demographics segmentation. See Behavior demographics segmentation response segmentation, 208–210 satisfaction segmentation. See Customer satisfaction segmentation value segmentation, 205–208 Customer support, web analytics in, 253 Customized research. See also Marketing research for collecting information of market size, 75 planning, 102–105 vs. syndicated research, 101–102 D Database systems, integrated, 10 Data exploration, 146 Data interval, 35 Data mining applications of for cross-selling, 140 for customer acquisition, 140 for customer attrition, 140 for customer profiling, 140 for customer segmentation, 139 for fraud detection, 140–141 for inventory optimization, 140 for marketing program performance forecasts, 140 for personalized messages, 140 defined, 6, 139 skills required for conducting, 14 stepwise thought process for actionable recommendations, 145 analysis conduction, 144–145 business area determination, 142 business issues into technical problems, translation of, 142–143 data sources identification, 143–144 goal identification, 141–142 tool and technique selection, 143 techniques. See Data mining techniques Data mining techniques analysis of dependence, 145 analysis of interdependence, 145 ANOVA, 172–175 association analysis, 190 cluster analysis. See Cluster analysis collaborative filtering, 190–192 conjoint analysis, 186, 187–188 correspondence analysis, 164, 168–172 data exploration, 146 discriminant analysis, 166–168 linear regression analysis multiple linear regression, 149–151 simple linear regression, 146–149 logistic regression, 188–190 multi-dimensional scaling (MDS), 176 metric MDS, 177, 178–179 273 274 Index Data mining techniques (Continued) principal component analysis, 163, 165 time series analysis. See Time series analysis Data quality, 11 Debt, 36 Demographic and behavior segmentation. See Behavior and demographics segmentation Dependent variable, 126 Diner’s Club credit card. See Credit cards Direct mail market survey by, 106 online advertising by using, 55 operational metrics for, 67 Direct mailing, for customer acquisition, 221–222 purchase models in financial impact of, 226–227 prospect scoring, 227 purchase after receiving catalog, 222–224 purchase without receiving catalog, 224–225 Direct sales, 92 Discrete variable, 117 Discriminant analysis, 166–168, 195 for customer satisfaction segmentation model building, 212–213 validation, 213–217 Divisive methods, in cluster analysis, 157–162 AID, 157 CART, 160–162 CHAID, 158–160 Domain analysis tools, 260 Dynamic models, 34–35 E E-commerce cross-selling strategies in association model for, 237–239 validation results, 239–241 web analytics in, 252–253 Economics skills, for data mining, 14 Elasticity, of dependent and independent variables, 23–34 E-mail conducting market survey by using, 106 online advertising by using, 55 operational metrics for, 68 Engagement process, 14 Equity, 36 Exchange rate, 79 Expectation, of random variable, 119 Experimental design, 134–135 Explanatory/predictive variables, 23 Exponential distribution, 124 F Face-to-face interviews, 108 Factor analysis, 165–166 F distribution, 125 Fixed costs, of investment, 33 Floating ad, 53 Focus groups, 109 Forrester Wave, 96 Fraud detection, data mining for, 140–141 Free listings, 257–258 G Gamma function, 124–125 GDP. See Gross domestic product (GDP) Geometrically distributed lag model, 35 Geometric mean, 120 Global Village, on value segmentation. See Value segmentation Goodman-Kruskal’s coefficient, 130 Google AdWords, 265 market share of, 76 proprietary algorithm of, 258 search engine marketing on, 53 Grid, 96 vs. perceptual map, 98 Gross domestic product (GDP), 77 Gross national product (GNP), 77 Gross profit, defined, 27 Gross rating points (GRP), 250 Index H Hierarchical methods, in cluster analysis agglomerative, 152–157 Ward’s ESS, 156–157 divisive methods AID, 157 CART, 160–162 CHAID, 158–160 Horizon, cross-selling strategies at, 236–237 association model, 237–239 validation results of, 239–241 HTML code, 261 Huber Sigma Corporation, 41 I Independent variable, 126 Indirect sales, 92 Inflexion point, 26 Information technology, skills for data mining, 14 Insta-Vue, 109 Insurance purchase, 229–231 Interest rate, 79 Internet advertisement. See Online advertising Internet marketing corporate websites, 54–55 email, 55 newsletters, 55–56 online advertising, 51–53 search engine marketing, 53–54 webinars, 56 Interstitial ads, 52 Investment fixed cost, 42–43 measurement with metrics, 42–43 nonresidential equipment, 77 software, 77 variable cost, 43 Investors, corporate, 36 J JupiterResearch, 247 K Kendall’s coefficient, 127–128 Kendall-Stuart’s coefficient, 130 Keywords, 259–260 evaluation tools, 264 Kurtosis, 122 L Lag model, 34 geometrically distributed, 35 Lags, 34 Leaderboard, 52 Leads, 34 defined, 241 generation of, 250–252 metrics, 58 Linear model, 24 Linear regression analysis, 25, 167–168 multiple linear regression, 149–150 F statistic, 151 R2 measures, 150–151 simple linear regression, 146–148 key assumptions for, 149 Link popularity analysis tools, 260–261 Logistic model, 32 Logistic regression, 70, 188–190 LTV (lifetime value) metrics, 44 M Magazines, 51 Magic Quadrant, 96 Manufacturing industry, website in, 267 Market growth effected by corporate profits, 79 currency exchange, 79 emerging technologies, 79–80 fluctuations in oil prices, 79 GDP growth, 77 interest rates, 79 political uncertainty, 77 unemployment, 79 trends, 80 Marketing channels, 30 effort, 36 executives, 5 275 276 Index Marketing (Continued) intelligence database systems, 10 defined, 10 investments, 21, 36 messaging, 70 plan. See Marketing plan products, 31 research. See Marketing research research companies, 88 returns. See Marketing returns Marketing plan based on market segmentation, 85–88 incorporating learning into, 10 incorporating market opportunity information into, 83–84 objective of, 35 Marketing research companies, 88 defined, 4 groups, 5 key components, 4 planning and implementation, 101 primary vs. secondary data, 105–106 report and presentation, 112–113 skills required for conducting, 13–14, 100 survey, 106–108 syndicated vs. customized, 101–105 types of, 13 Marketing returns, 21 effected by environmental changes, 33 impact of advertising on generating, 28 Marketing spending, 21 models. See Marketing spending models by multiple communication channel, 30–31 by multiple products, 31–32 optimization of, 9, 27–29 Marketing spending model, 21–23 and corporate finance, 35–37 dynamic, 34–35 static, 23–34 Market opportunity assessment of, 8 by market growth, 80 by market share, 80–81 by market size, 75–76 competitive analysis of, 8–9 impact of macroeconomic factors on, 77–80 Market response model. See Marketing spending model Market segmentation by market share, 82–83 by market size and growth, 81–82 Market share, of a company of search engine companies, 76 in terms of revenues, 80 in terms of units sold, 81 Market size, 75–76 MDS. See Multi-dimensional scaling (MDS) Mean, 120 Median, 120 Meta tags, 261 Metrics experts, 13 investment metrics, 42–43 operational metrics, 4, 61 optimization, 9, 68 analyzing data for, 70 determining time frame of, 69–70 identification of, 68–69 identifying influential attributes of, 70–71 learning from marketing campaigns for, 71 tools for, 70 process for identification of, 45–46 proxy metrics, 60 return metrics, 4, 42 role of, 4 selection of, 7–8 skills, 13 Mode, 120 Modern portfolio theory (MPT), 36 Mountaineer, customer acquisition model of. See Direct mailing, for customer acquisition Index Moving average (MA) model, in time series analysis, 181 MPT. See Modern portfolio theory (MPT) MS-Office, 13 Multi-channel campaign cost and target volume of, 66 learning from, 71 performance optimization of, 67–71 performance reporting of, 65–67 Multi-dimensional scaling (MDS), 176 metric MDS, 177, 178–179 Multiple linear regression, 149–150 F statistic, 151 R2 measures, 150–151 N Netting Inc., cross-selling strategies at association model in, 242–243 validation results, 243–244 for online advertising, 241–242 Newsletters, 55–56 Newspapers, 51 Neyman-Pearson paradigm, 131 Nielsen Media Research, 108 Nominal (or categorical) variable, 117 Nonresidential equipment, 77 Nonstationary (or evolving) markets, 23 Normal distribution, 123 NPD Group, 109 Null hypotheses, 130 O Oil prices, 79 OLAP tools, 13 Omnibus studies, 109 One-to-one marketing by using direct mail, 55 by using e-mail, 55 by using telemarketing, 56 Online advertising, 51–53, 241–242 association model for, 242–243 validation results, 243–244 Online banner advertisement, 52 defined, 241 operational metrics for, 67, 68 Online request forms, 14 Online site analysis tools, 259–260 Online survey, for brand awareness, 248 Onsite search description, 265–266 for information sites, 265 vistor segmentation for, 266–267 vist scenario analysis, 267–268 Open Directory Project (ODP), 259 Operational metrics, 61 for direct mail, 67 for e-mail, 68 for online banners, 67, 68 for paid search, 69 Ordinal data, 117 Organic search, 258 Organization of Petroleum Exporting Countries (OPEC), 79 P PageRank, 258 Paid listings, 258 Paid search, 258, 263–264. See also Search engine marketing (SEM) operational metrics for, 69 Panels, 108–109 ‘Parent’ magazine, 51 Partial autocorrelation function (PACF), in time series analysis, 182–183, 185 Partitioning methods, in cluster analysis, 162–163 Pay per click, 54, 61. See also Search engine marketing (SEM) Pearson correlation coefficient, 126–127 People Meter, 108 Percentile, 121 Perceptual map, 98–99 advantages and disadvantages of, 100 vs. grid, 98 Phone interview, 107 Poisson distribution, 123–124 Population, defined, 118 Pop-under ads, 53 277 278 Index Pop-up ads, 53 Predictive dialing, of CATI, 107 Principal component analysis, 163, 165 Probability defined, 118–119 density, 119 curve, 130, 131 density functions of binomial distribution, 123 of chi-square ( 2) distribution, 124–125 of exponential distribution, 124 of F distribution, 125–126 of normal distribution, 123 of Poisson distribution, 123–124 of Student’s t distribution, 125 of uniform distribution, 122 distribution function, 119 mass, 119 Probability mass, 119 Product features, 9 Product life cycle, 79 Project communication, 15 prioritization, 14–15 Proprietary algorithms, of search engines, 258 Proxy metrics, 60 for brand awareness, 248–249 Prozac, 46–48 Purchase models, in customer acquisition financial impact of, 226–227 prospect scoring, 227 purchase after receiving catalog, 222–224 purchase without receiving catalog, 224–225 Q Questionnaires. See Surveys R Random variables, 118 Range, 120 Rejection area, of probability density curve, 130 Request for proposal (RFP), 103, 105 Research report, 112–113 Response modeling, 9 Response segmentation CHAID analysis on model building, 209–210 model validation, 210–211 Retention rate, 35 Return metrics, 42–43 at awareness stage, 57 at consideration stage, 58–59 at interest and relevance stage, 57–58 at loyalty and referral stage, 60 at purchase stage, 59 rollup of, 66 vs. operational metrics, 61 Return on investment (ROI), 43–44 tracking of, 44–45 Returns, measurement with metrics, 42 Revenue-driving factors, 71 Revenue flows, from direct and indirect sales, 92–93 RFID (radio frequency identification), 80 RFP. See Request for proposal (RFP) Rich media advertising, 52 Rollover ads, 53 S Safe Net, customer growth model of, 229–231 Sales direct and indirect, 92 effected by marketing activities, 28 stages of awareness, 46–48 consideration, 48–49 interest and relevance, 48 loyalty and referral, 49 purchase, 49 Sample, defined, 118 Sample size, determination of based on sample mean, 111 based on sample proportion, 111 Sampling methods nonprobability, 110 probability, 109–110 Index SCANTRACK, 109 Search engine marketing (SEM), 53–54 described, 263 metrics, 265 paid search marketing, 263–264 resources, 264 vs. SEO, 265 Search engine optimization (SEO) black hat optimization, 258 organizational implementation of, 259 vs. SEM, 265 website analysis in Alt text, 262 domain analysis tools, 260 link popularity analysis tools, 260–261 meta tags, 261 online site analysis tools, 259–260 web page title tag, 261–262 white hat optimization, 258 Search engines, 53–54 marketing. See Search engine marketing (SEM) market share of, 76 optimization of. See Search engine optimization (SEO) proprietary algorithms of, 258 website listings in, 257–258 websites’ submission to, 259 SEM. See Search engine marketing (SEM) Semilogarithm model, 26 Seminars, 56 Senior managers, task of, 36 SEO. See Search engine optimization (SEO) Shareholder value, 36 Share of wallet, defined, 71 Sigma Corporation, 41 Sigmoid functions, 32 Simple linear regression, 146–148 key assumptions for, 149 SiteCatalyst, 13, 249 Skew, 121 Skyscraper ads, 52 Social science, skills in, 13 Software investment sectors, 77 Software tools, for web analytics, 247–248 Sommer’s coefficient, 130 Spearman’s rank correlation coefficient, 128 S-shaped model, 26 Stakeholders business objectives, 6–7 identifying, 6 Standard deviation, 121 Standard error, 121 Static models, 23–34 defined, 22 Statistics, skills of, 14 Stores, as a marketing communication channel, 56 Streaming video, 52 Student’s t distribution, 125 Superstitial ads, 52–53 Surveys, market research by direct mail, 106 by email, 106 by face-to-face interview, 108 by telephonic interview, 107 SWOT analysis, 96–98 Syndicated research. See also Market research companies, 75 data collection for, 105–106 determination of market size with, 75–76 limitations of, 90 vs. customized research, 101–102 T Tabulation, 95–96 Target-audience attributes of, 88–89 primary vs. secondary data, 105 segmentation of, 89–91 Target audience migration, identification of, 46–49 Telemarketing, 56 Text links, sponsored, 52 Time series analysis, 22 ARIMA, 183, 184 autocorrelation function (ACF), 182–183, 185 279 280 Index Time series analysis (Continued) autocorrelation in, 180 autoregressive (AR) model, 180–181 moving average (MA) model, 181 objective of, 179–180 partial autocorrelation function (PACF), 182–183, 185 Toyota, 267 Tradeshows, 56 Travel Wind survey, on behavior and demographics segmentation. See Behavior and demographics segmentation t test, 131–133 U UNC Bank, attrition model of, 227–229 Unemployment rate, and market growth, 79 Uniform density function, of random variable, 122 US government, and defense investment, 77 V Value segmentation CART analysis of model building, 207–208 model validation, 208 defined, 206 Variable costs, of investment, 33 Variance, 120 Vendor proposal, evaluation and selection of, 105 Versatile Electronics survey, on customer satisfaction segmentation attributes in, 211 discriminant analysis of model building, 212–213 validation, 213–217 W Wall Street Journal, 51 Wal-Mart, 80 Ward’s Error Sum of Squares (Ward’s ESS), 156–157 Ward’s ESS. See Ward’s Error Sum of Squares (Ward’s ESS) War in Iraq, 77 Web analytics applications of, 247 in brand awareness, 248–249 in customer support and service, 253 in e-commerce sales, 252–253 in lead generation, 250–252 software tools for, 247–248 in syndicated research, 253–254 in website content management, 249–250 Webinars, 56 Web page title tag, 261–262 Website analysis, in SEO Alt text, 262 domain analysis tools, 260 link popularity analysis tools, 260–261 meta tags, 261 online site analysis tools, 259–260 web page title tag, 261–262 Website content management, 249–250 Websites. See also Search engines advertising on, 51–52 calculating number of visitors on, 58 corporate, 54–55 generating online sales, 41 listings free listings, 257–258 paid listings, 258 Prozac.com, 46 Websites’ submission, to search engines, 259 White hat optimization, 258 Wonder Electronics’ survey, on response segmentation. See Response segmentation Y Yahoo market share of, 76 online advertising on, 51 search engine marketing on, 52 Z Zoomerang, 248 Z test, 131