Uploaded by trhieu774

Susan Chiu, Domingo Tavella - Data Mining and Market Intelligence for Optimal Marketing Returns-Butterworth-Heinemann (2008)

advertisement
Data Mining and Market
Intelligence for Optimal
Marketing Returns
This page intentionally left blank
Data Mining and Market
Intelligence for Optimal
Marketing Returns
Susan Chiu
Domingo Tavella
AMSTERDAM • BOSTON • HEIDELBERG • LONDON • NEW YORK • OXFORD
PARIS • SAN DIEGO • SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO
Butterworth-Heinemann is an imprint of Elsevier
Butterworth-Heinemann is an imprint of Elsevier
Linacre House, Jordan Hill, Oxford OX2 8DP, UK
30 Corporate Drive, Suite 400, Burlington, MA 01803, USA
First edition 2008
Copyright © 2008 Susan Chiu and Domingo Tavella.
Published by Elsevier Inc. All rights reserved
The right of Susan Chiu and Domingo Tavella to be identified as the authors of this work
has been asserted in accordance with the Copyright, Designs and Patents Act 1988
No part of this publication may be reproduced, stored in a retrieval system
or transmitted in any form or by any means electronic, mechanical, photocopying,
recording or otherwise without the prior written permission of the publisher.
Permissions may be sought directly from Elsevier ’s Science & Technology Rights
Department in Oxford, UK: phone (⫹44) (0) 1865 843830; fax (⫹44) (0) 1865 853333;
email: permissions@elsevier.com. Alternatively you can submit your request online
via the Elsevier homepage (http://elsevier.com), by selecting “Support & Contact”
then “Copyright and Permission” and then “Obtaining Permission”.
Notice
No responsibility is assumed by the publisher for any injury and/or damage to persons
or property as a matter of products liability, negligence or otherwise, or from any use or
operation of any methods, products, instructions or ideas contained in the material herein
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the Library of Congress
ISBN: 978-0-7506-8234-3
ISBN: 978-0-7506-7980-0
For information on all Butterworth-Heinemann publications
visit our web site at http://books.elsevier.com
Typeset by Charon Tec Ltd., A Macmillan Company.
(www.macmillansolutions.com)
Printed and bound in United States of America
08 09 10 11
10 9 8 7 6 5 4 3 2 1
Contents
Preface
Biographies
1
2
xi
xiii
Introduction
Strategic importance of metrics, marketing research and
data mining in today’s marketing world
The role of metrics
The role of research
The role of data mining
An effective eight-step process for incorporating metrics,
research and data mining into marketing planning
and execution
Step 1: identifying key stakeholders and their
business objectives
Step 2: selecting appropriate metrics to measure
marketing success
Step 3: assessing the market opportunity
Step 4: conducting competitive analysis
Step 5: deriving optimal marketing spending and media mix
Step 6: leveraging data mining for optimization and getting
early buy-in and feedback from key stakeholders
Step 7: tracking and comparison of metric goals and results
Step 8: incorporating the learning into the next round of
marketing planning
Integration of market intelligence and databases
Cultivating adoption of metrics, research and data mining
in the corporate structure
Identification of key required skills
Creating an effective engagement process
Promoting research and analytics
1
11
12
14
15
Marketing Spending Models and Optimization
Marketing spending model
Static models
Dynamic models
19
21
23
34
3
4
4
6
6
6
7
8
8
9
9
9
10
10
vi
Contents
3
4
5
Marketing spending models and corporate finance
A framework for corporate performance marketing
effort integration
35
Metrics Overview
Common metrics for measuring returns and investments
Measuring returns with return metrics
Measuring investment with investment metrics
Developing a formula for return on investment
Common ROI tracking challenges
Process for identifying appropriate metrics
Identification of the overall business objective
Understanding the impact of a marketing effort on target
audience migration
Selection of appropriate marketing communication
channels
Identification of appropriate return metrics by stage in the
sales cycle
Differentiating return metrics from operational metrics
39
41
42
42
43
44
45
45
Multi-channel Campaign Performance Reporting and
Optimization
Multi-channel campaign performance reporting
Multi-channel campaign performance optimization
Uncovering revenue-driving factors
Understanding the Market through Marketing Research
Market opportunities
Market size
Factors that impact market-opportunity dynamics
Market growth trends
Market share
Basis for market segmentation
Market segmentation by market size, market growth, and
market share: case study one
Using market research and data mining for building a
marketing plan
Marketing planning based on market segmentation and
overall company goal: case study two
Target-audience segmentation
Target-audience attributes
Types of target-audience segmentation
36
46
49
56
61
63
65
67
71
73
75
75
76
80
80
81
82
85
85
88
88
89
Contents
Understanding route to market and competitive landscape
by market segment
Routes to market
Competitive landscape
Competitive analysis methods
Overview of marketing research
Syndicated research versus customized research
Primary data versus secondary data
Surveys
Panel studies
Focus groups
Sampling methods
Sample size
Research report and results presentation
Structure of a research report
6
7
Data and Statistics Overview
Data types
Overview of statistical concepts
Population, sample, and the central limit theorem
Random variables
Probability, probability mass, probability density, probability
distribution, and expectation
Mean, median, mode, and range
Variance and standard deviation
Percentile, skewness, and kurtosis
Probability density functions
Independent and dependent variables
Covariance and correlation coefficient
Tests of significance
Experimental design
Introduction to Data Mining
Data mining overview
An effective step by step data mining thought process
Step one: identification of business objectives and goals
Step two: determination of the key focus business areas and
metrics
Step three: translation of business issues into technical
problems
Step four: selection of appropriate data mining techniques
and software tools
Step five: identification of data sources
91
91
93
95
100
101
105
106
108
109
109
110
112
112
115
117
117
118
118
118
120
120
121
122
126
126
130
134
137
139
141
141
142
142
143
143
vii
viii
Contents
Step six: conduction of analysis
Step seven: translation of analytical results into actionable
business recommendations
Overview of data mining techniques
Basic data exploration
Linear regression analysis
Cluster analysis
Principal component analysis
Factor analysis
Discriminant analysis
Correspondence analysis
Analysis of variance
Canonical correlation analysis
Multi-dimensional scaling analysis
Time series analysis
Conjoint analysis
Logistic regression
Association analysis
Collaborative filtering
144
8
Audience Segmentation
Case study one: behavior and demographics segmentation
Model building
Model validation
Case study two: value segmentation
Model building
Model validation
Case study three: response behavior segmentation
Model building
Validation
Case study four: customer satisfaction segmentation
Model building
Validation
193
195
196
201
205
207
208
208
209
210
210
212
213
9
Data Mining for Customer Acquisition, Retention,
and Growth
Case study one: direct mail targeting for new customer
acquisition
Purchase model on prospects having received a catalog
Purchase model based on prospects not having received
a catalog
Prospect scoring
Modeling financial impact
145
145
146
146
151
163
165
166
168
172
175
176
179
186
188
190
190
219
221
222
224
226
226
Contents
Case study two: attrition modeling for customer retention
Case study three: customer growth model
227
229
10
Data Mining for Cross-Selling and Bundled Marketing
Association engine
Case study one: e-commerce cross-sell
Model building
Model validation
Case study two: online advertising promotions
Model building
Model validation
233
235
236
237
239
241
242
243
11
Web Analytics
Web analytics overview
Web analytic reporting overview
Brand or product awareness generation
Web site content management
Lead generation
E-commerce direct sales
Customer support and service
Web syndicated research
245
247
248
248
249
250
252
253
253
12 Search Marketing Analytics
Search engine optimization overview
Site analysis
SEO metrics
Search engine marketing overview
SEM resources
SEM metrics
Onsite search overview
Visitor segmentation and visit scenario analysis
255
257
259
262
263
263
264
265
265
Index
269
ix
This page intentionally left blank
Preface
Over the last several decades, Marketing Research has been benefiting from
the ever-increasing wave of quantitative innovation in fields that have been
traditionally regarded as the purview of softer disciplines. The rising level
of quantitative education in the marketing research community, the extraordinary wealth of information accessible on the Internet, along with fierce
competition for customers conspire to create a growing need for sophisticated applications of data-mining, statistical, and empirical methodologies
to the formulation and implementation of marketing plans.
As business experience is increasingly informed by the results of rigorous analysis, it becomes ever more clear that the application of quantitative modeling techniques in marketing has a direct effect on the bottom
line. In the extremely competitive environment of the global economy,
the potential high price of a misdirected marketing effort is made unacceptable by the abundance of information that, if properly extracted and
interpreted, can guide the effort to success.
This book’s primary audience is the quantitative middle of the marketing professional spectrum. The primary objectives of the book are to
distill and present a portfolio of techniques and methods of demonstrable efficacy in the design, implementation, and continued assessment of
a marketing effort. The selection of techniques and the extent and depth
of coverage of the quantitative background needed for their practical use
have benefited from our experience in practical marketing research and
quantitative modeling. The resolution of business issues and the practicality of implementation have been our most important guiding principles in covering the material.
The materials we discuss are essential components in today’s sophisticated quantitative marketing professional’s toolbox. The mathematical
and statistical issues whose understanding is required to insure the correct interpretation of the various methodologies and their outputs are
introduced with minimal complexity. The emphasis in on practical applications, exemplified with case studies implemented in standard computational analysis environments, such as SAS and SPSS.
There are three main components in the coverage of the book. The
first component refers to the importance and integration of marketing
research, metrics, and data mining into the marketing investment process.
xii
Preface
The second is a detailed discussion of marketing research and data mining methods with a view to solve the practical needs of a marketing effort
design and implementation. The third thrust of the book is the application of the methodology to illustrative case studies, representative of the
common practical challenges marketing professionals confront.
San Francisco
September 2007
Susan Chiu
Domingo Tavella
Biographies
Susan Chiu
Susan Chiu is currently Director of Business Intelligence at Ingram Micro,
Inc., where she is responsible for advanced analytics and marketing
research consulting. Susan Chiu has over 15 years of quantitative marketing research experience and has held positions in analytics, data mining, and business intelligence with Cisco Systems, Wells Fargo, Providian
Bancorp, and Safeway Coporation. Susan Chiu has a Masters degree in
Statistics from Stanford University.
Domingo Tavella
Domingo Tavella is Principal of Octanti Associates, a consulting firm
focused on advanced quantitative modeling in finance and marketing.
Dr. Tavella has over 25 years of mathematical and computational modeling experience in fields ranging from aerodynamic design, biomedical
simulation, computational finance, and marketing modeling. He holds a
Ph.D. in Aeronautical Engineering from Stanford University and an MBA
in Finance from UC Berkeley.
This page intentionally left blank
CHAPTER 1
Introduction
This page intentionally left blank
■ Strategic importance of
metrics, marketing research and
data mining in today’s marketing
world
Today’s marketing executives are under significant pressure to be
accountable for their companies’ returns on investment both in the
boardroom and in front of their shareholders. The following excerpt
from Business Week by Brady, Kiley, and Bureau Reports (Farris, Bendle,
Pfeifer and Reibstein 2006) vividly encapsulates this shift in what is
expected of marketing executives.
‘For years, corporate marketers have walked into budget meetings like
neighborhood junkies. They couldn’t always justify how well they spent
past handouts or what difference it all made. They just wanted more
money – for flashy TV ads, for big-ticket events, for you know, getting out
the message and building up the brand. But those heady days of blind
budget increases are fast being replaced with a new mantra: measurement
and accountability.’
As pressure for accountability cascades through an organization, every
functional group is under scrutiny, and those who cannot quantify their
impact on generating satisfactory returns on investment are placed in a
vulnerable position. At downsizing or budget reduction time, marketing
executives are in the front line. Marketing, as it turns out, is among those
corporate functions that are under the closest scrutiny.
In recent years, there has been increased awareness and a stronger
motivation among marketing professionals to quantify returns on investment. However, there is also a challenge in selecting the proper tools for
measuring market returns from the large number of strategic and analytic
tools that have emerged in the past decade.
Planning, research, execution, and optimization are the four key stages
in marketing efforts. The objective of the planning stage is to define the
appropriate metrics for measuring marketing returns. The number of
metrics needs to be kept under control to ensure that the measuring task
is achievable. In the research stage, marketing research is done to have
a better understanding of the overall market opportunities and the competitive landscape. In the execution stage, effective implementation is an
essential requirement for the success of the marketing effort. In the optimization stage, marketing strategies and tactics are optimized and finetuned on an ongoing basis.
4
Data Mining and Market Intelligence
■ The role of metrics
In the previous section, we alluded to the need for defining marketing
metrics at the planning stage. A metric is a variable that can be measured
and used to quantify the performance of a marketing effort.
Metrics fall into the following categories: return metrics, investment
cost metrics, operational metrics, and business impact metrics. It is important to understand the roles that different types of metrics play.
Return metrics are often referred to as key performance indicators
(KPI) or success metrics. The costs of marketing programs, goods sold,
and capital are investment cost metrics that must be optimally related to
metrics measuring investment returns. Operational metrics influence the
performance of return metrics (most of the metrics we consider fall under
this category), and a thorough understanding of their impact on return
metrics is essential in order to track those with the highest potential. One
common mistake in marketing is to invest significant resources to track
hundreds of operational metrics without precisely quantifying whether
they significantly influence success.
Finally, it is essential to understand how marketing investment impacts
a company’s financial performance. Ideas such as cash flow analysis or
economic value added (EVA) have been utilized to link marketing investment and company financial performance (Doyle 2000).
■ The role of research
In essence, marketing research consists of the discovery and analysis of
information for understanding the market that a particular firm faces. The
America Marketing Association (AMA) offers a comprehensive definition
of marketing research (Bennett 1988).
‘Marketing research links the consumer, customer, and public to the
marketer through information – information used to identify and define
marketing opportunities and problems; generate, refine, and evaluate
marketing actions, monitor marketing performance; and improve understanding of marketing as a process. Marketing research specifies the information required to address these issues; designs the method for collecting
information; manages and implements the data collection process; analyzes the results; and communicate the findings and their implications.’
Since customers are key components of a market, customer research
should also be considered as part of marketing research.
Marketing research has been present in the corporate world for decades. Its applications mainly focus on market sizing, market share analysis, product concept testing, pricing strategies, focus groups, brand
Introduction
perception studies, and customer attitude or perception research. The
following examples are typical applications of marketing research to
address business problems. Although these examples remain fairly common marketing research applications, they are somewhat limited in the
whole scheme of marketing investment.
●
●
●
●
●
Running a focus group to evaluate customer experience in certain
retail bank branches
Determining the feasibility of a full product rollout by first conducting
a test in a small and easy-to-control market
Conducting a recall test to determine a TV advertisement’s impact on
product awareness
Compiling market share information for a briefing to a group of
industry analysts
Conducting a focus group to evaluate new product features.
Marketing research groups are often spread across various corporate
functions such as corporate communications, public relations, corporate
marketing, segment marketing, vertical marketing, business units, and
sales. Under such an organizational setup, the various marketing research
efforts in a particular firm serve specific purposes and are sometimes disconnected from each other. In recent years, there has been recognition
that optimal synergy among research teams requires centralization of the
marketing research teams.
The recent economic climate has fostered a broader application of
research to marketing investment. For securing resources and funding, marketing investment plans need to be justified by a reasonable level of returns,
and this justification needs to be backed up by facts, forecasts, data, and
analysis of opportunities. Marketing research generates market opportunity information ideal for supporting such marketing investment plans. For
instance, one important question to address is the geographical allocation of
marketing investment. Marketing research can be used to determine market
opportunities by geography and to drive optimal investment decisions.
With increasing frequency, marketing executives at major corporations are asked to submit their annual budget plans with forecast of corresponding returns on investment. The best practices report Maximizing
Marketing ROI by the American Productivity and Quality Center (APQC)
in conjunction with the Advertising Research Foundation (ARF) reported
the following findings (Lenskold 2003).
●
●
●
The pressure is on marketing to demonstrate a quantifiable return and
on CEOs to deliver value to their stockholders and business alliance
partners
ROI-based marketing is sought by more marketers
ROI-based models encourage decision makers to challenge and revise
the budgeting process.
5
6
Data Mining and Market Intelligence
■ The role of data mining
Berry and Linoff give the following definition of data mining (Berry and
Linoff 1997).
Data mining is the process of exploration and analysis, by automatic
or semi-automatic means, of large quantities of data in order to discover
meaningful patterns and rules.
In essence, data mining is the application of statistical methodologies
to analyzing data for addressing business problems. While marketing
research allows for opportunities to be identified at a macro level, data
mining enables us to discover granular opportunities that are not immediately obvious and can only be detected through statistical techniques
and advanced analytics. High-level insights provide directional guidance while granular detail facilitates optimization, execution, and tactics.
Insight garnered through marketing research can help drive data mining
analysis by providing the initial direction. Conversely, results from data
mining analysis can be used to refine high-level strategies. Marketing
research and data mining are two disciplines that are complementary to
each other, and there is growing awareness of the values added that these
two disciplines combined can provide.
■ An effective eight-step process for
incorporating metrics, research and
data mining into marketing planning
and execution
The following flowchart (Figure 1-1) summarizes a step-by-step approach
for incorporating metrics, marketing research, and data mining into marketing planning and execution.
Step 1: identifying key stakeholders and their
business objectives
It is crucial to identify the key stakeholders that will support a market
effort and those that will implement the recommendations from research
and analysis. Buy-in from key stakeholders throughout the process is
essential for getting analytic results accepted and implemented.
Key stakeholders need to quantify their business objectives and define
such objectives as goals to be achieved. An objective might be to increase
Introduction
Step 1
Identifying key
stakeholders
and their
business
objectives
Step 2
Selecting
appropriate
metrics to
measure
marketing
success
Step 3
Assessing
the market
opportunity
Step 4
Conducting
competitive
analysis
Step 5
Determining
optimal
marketing
spending and
media mix
Step 6
Leveraging data
mining for
optimization and
getting early buyin and feedback
from key
stakeholders
Step 7
Tracking and
comparing of
metric goals and
results
Step 8
Incorporating
learning into
next round of
marketing
planning
Figure 1-1
An effective eight-step process for incorporating metrics, marketing
research, and data mining into marketing planning and execution.
sales revenue. An increase in revenue by a specific percentage over the
course of a year, therefore, is a quantified objective to be reached. An
objective that is not quantified is hard to measure against and should not
be used to derive investment strategy.
Step 2: selecting appropriate metrics to measure
marketing success
A marketing plan requires that metrics should be clearly defined from the
outset since the selection of appropriate metrics can direct resources to optimal use. Multiple metrics may need to be examined simultaneously to glean
the insights we seek. Multiple metrics can be used to validate one another
and maximize the accuracy of the information gathered. A single metric
such as revenue growth alone might not shed as much light on the true
opportunity as revenue growth and market share information combined.
If the business objective is to increase sales revenue by a given amount,
then naturally sales revenue is the appropriate return metric to track. In
7
8
Data Mining and Market Intelligence
the case of an advertising program, brand equity (the monetary worth of
a brand) and brand awareness are the appropriate return metrics to track.
Brand equity is defined as the net present value of the future cash flow attributable to the brand name (Doyle 2000) while brand awareness is the level of
exposure and perception that customers have about a particular brand.
Step 3: assessing the market opportunity
Market opportunity assessment consists of addressing four fundamental
questions.
1.
2.
3.
4.
Where are the market opportunities?
What are the market segments?
What is the size of each segment?
How fast does each segment grow?
Market opportunity information can be acquired through multiple
approaches. One approach is through exploration of publicly available
news and existing company internal data. Another approach is leveraging
third party marketing research sources, which offer a wide range of forecasts on market opportunities by segment. These forecasts, which consist of
both opportunity size and growth information, tend to be driven by different assumptions. In situations where market opportunity information is not
readily available, customized research is required to gather the information.
Step 4: conducting competitive analysis
In the absence of competition, a company can take full advantage of market opportunities. With competition, however, companies can only realize
market opportunities by understanding and outperforming their competitors. As Aaker points out, one important reason why the Japanese automobile firms were able to penetrate the US market successfully, especially
during the 1970s, is that they were much better than US firms at doing
competitive analysis. David Halberstam described the Japanese effort
at competitor analysis in the 1960s ‘They came in groups… They measured, they photographed, they sketched, and they tape-recorded everything they could. Their questions were precise. They were surprised how
open the Americans were’ (Aaker 2005).
Competitive intelligence is an extremely important discipline in the
world of marketing research and data mining. A combination of survey
data and real life transaction data can be used to analyze and track competitive information.
Part of a competitive intelligence analysis is to objectively assess product features, pricing, and brand value of the key players in a market.
Introduction
Product features that meet customer needs represent competitive advantages, and pricing is often used as a tool for gaining market share at the
expense of profitability. Since brand perception often affects purchasing
decisions it is important to incorporate brand strength and weakness
analysis into competitive intelligence.
Step 5: deriving optimal marketing spending
and media mix
After the fundamental information on market opportunities and competitor
landscape has been collected, we proceed to determine the optimal marketing spending given a business objective. As we will elaborate in Chapter 2,
there are numerous analytical approaches for modeling optimal marketing spending. Optimization involves maximization or minimization of a
particular metric such as maximization of profit and minimization of risk.
Maximizing profit is the most common objective in optimization of marketing spending. Some companies may choose to maximize revenue regardless of profitability, but doing so imperils the firm’s long-term value.
Step 6: leveraging data mining for optimization
and getting early buy-in and feedback from
key stakeholders
The high-level and directional insights into market opportunities provided by marketing research serve as the foundation for building a highlevel marketing strategy. However, implementation of a high-level strategy
through tactics requires significant analytical work. This is where data mining adds value by delineating a ‘how to’ road map to realize the opportunities uncovered by research. Marketing research could identify a geographic
area as the best opportunity. Since it is very costly to target every prospect
in this geography, it is necessary to select a target list for a marketing campaign, which requires building a response model to predict the likelihood
of a prospect’s response. Response modeling requires statistical data mining techniques such as trees and logistic regression. Soliciting key stakeholders’ feedback and input in the data collection, research and data mining
processes can help fine-tune the accuracy and objectivity of the data mining
effort by removing potential roadblocks and barriers in the processes.
Step 7: tracking and comparison of metric
goals and results
In the final presentation on the performance of a marketing campaign, it
is essential to compare results derived from the application of the selected
9
10
Data Mining and Market Intelligence
key metrics against the initial business goal. In a successful marketing
campaign where goals are achieved, effective strategy and tactics can be
applied to future campaigns. In a failed marketing campaign where the
result trails the goal, areas of improvement for strategy and tactics can be
identified for improving the performance of future campaigns.
The final presentation of any research or data mining project is a decisive factor for the success of the project. Good research or data mining
work poorly presented will fail to gain adoption and implementation. We
have been the victims of speakers who did not know how to ‘work an
audience’, to bring them to the point where they are quite ready to accept
what is being recommended (Blankenship and Breen 1995).
Step 8: incorporating the learning into the next
round of marketing planning
Learning from the past programs needs to be incorporated into the next
round of marketing planning as an ongoing optimization process, a practice that ultimately leads to a competitive advantage. Learning over time
transcends to internal and proprietary market and customer intelligence
which competitors have no access to.
■ Integration of market intelligence
and databases
Market intelligence refers to insights generated from marketing research or
data mining. Market intelligence provides the maximum value and insight
when its components and parts are weaved together to depict an overall
picture of the market opportunities and challenges. Information on revenue growth, competitors, or market share in isolation does not provide
significant value, since a company may be growing its revenue but at the
same time losing market share if its competitors grow faster. Information
on past customer purchase data can often be misleading if the future needs
of the same customers differ drastically from their past needs.
To facilitate building market and customer intelligence, it is necessary to have integrated database systems that link together data from
sales, marketing, customer, research, operations, and finance. Although
not a requirement, ideally all the data would be maintained on the same
hardware system. If there is more than one single database, then marketing, sales, customer, research, and finance databases need to be related
through some sort of identification ID such as customer ID, campaign or
program code, date of purchase, and transaction ID.
Introduction
Marketers often encounter data quality challenges. The following is a
list of common data quality issues (Groth 2000).
●
●
●
●
●
●
Redundant data
Incorrect or inconsistent data
Typos
Stale data
Variance in defining terms
Missing data.
The best strategy to deal with data quality is to make sure that key
stakeholders are fully aware of the imperfections of any data issues. Very
often these same key stakeholders can help drive efforts for cleaning and
standardizing the data.
Poor data quality arises due to many factors, not the least of which is
erroneous data from original data source systems. These source systems
may include systems of Enterprise Resource Planning (ERP), Point of Sale
(POS), Financial, Customer Relationship Management (CRM), Supply
Chain Management (SCM), marketing research, campaign management,
advertising servers, e-mail delivery, web analytic tools, and call centers.
Firms should consider establishing an automatic process that checks and
corrects data input into the source systems.
Market opportunity forecasts created by internal departments may vary
from those provided by external research firms. The former are often used
for setting sales goals and as a result tend to be more conservative while the
latter tend to be more aggressive to accommodate a broad set of objectives
and assumptions of research subscribers. This difference may lead to inconsistencies that make it difficult to assess the accuracy of the data.
■ Cultivating adoption of metrics,
research and data mining in the
corporate structure
Given the importance of metrics, research, and data mining, having a team
specialized in these areas working closely with all key business functional
groups can be a competitive advantage. In high-tech industries where sales
and marketing groups are often run as separate groups, it is imperative that
a dedicated analytic team interface with both marketing and sales groups
to ensure proper planning and execution. When sales and marketing agree
upon common metrics for setting their benchmarks, the two groups can
work effectively together. If sales and marketing have different assessments
of market potential, the two groups will likely create unsynchronized or
11
12
Data Mining and Market Intelligence
even conflicting goals in their marketing and sales programs, which may
result in suboptimal execution of the overall marketing effort.
The following are additional tips for successfully incorporating research
and analytics into the corporate structure.
Identification of key required skills
Skills in three key disciplines, metrics measurement, marketing research,
and data mining, are required for assembling a successful research and
analytic team effort. Besides discipline-specific capabilities, there are additional skills that are common requirements across the three disciplines.
Common required skills
Clear communication enables a research and analytic team to effectively
acquire feedback and articulate findings, thereby facilitating buy-in from
key stakeholders. Many analytic professionals are used to communicating in technical terms and have difficulty translating technical terms into
plain everyday language. This imposes an extra burden on analytic professionals when explaining analytic concepts to their nonanalytic peers.
Two of the most common communication issues are a lack of a clear
understanding of the questions asked, and the tendency to give unnecessary information when delivering an answer.
An executive who asks a question ‘What is the expected return of the
program?’ expects to get a response clearly stating the expected return.
Rather than giving a direct answer, many analytical professionals tend
to give a vague response and then quickly go on and elaborate on the
data mining techniques applied even when the executive does not specifically ask about the data mining techniques being used. The first step
toward resolving this communication issue between analytic and nonanalytic professionals is to cultivate ‘active listening’ skills. Active listening
requires understanding of what others ask before giving replies.
Another required common skill is the ability to focus on truly important tasks and to be able to prioritize tasks based on predetermined criteria, a significant challenge when confronting multiple projects. One way to
facilitate focus and prioritization is to establish and formalize a standard
engagement process where given criteria are used to determine the priority
level of a project. Such criteria may include expected return on investment,
turnaround time, resource requirement, revenue potential, and risk level.
Another required skill across metrics, research, and data mining is experience and training in marketing and knowledge of the company line of
business. The type of marketing experience and training required depends
on the overall company marketing culture and use of communication
media. Some corporations rely on traditional marketing communication
Introduction
channels such as print and catalog while others focus on new media such
as e-mail, search, blog, ipod, and web marketing. Familiarity with the specific types of marketing communication channels that a firm uses allows
for derivation of deeper insights from analysis and more substantial business recommendations.
Metrics-specific required skills
Metrics specific skills are also called measurement skills, which in the marketing consulting world refer to the identification and tracking of marketing
campaign performance. Metrics skills include hands-on experience in tracking and measuring performance of a wide array of marketing communication channels. These communication channels include, but are not limited
to, TV, radio, direct mail, e-mail, telemarketing, web marketing, online or
print advertising, search marketing, social marketing (blog, community
marketing), and podcast. Usage of metrics does not require advanced data
mining or statistical skills; rather it requires hands-on experiences in marketing campaign planning, management, execution, and performance
tracking and analysis.
Metrics experts are expected not only to have extensive understanding of marketing channels and programs, but also to have clear insights
into what is important to the overall marketing business. Before selecting any metrics, metrics experts conduct discovery meetings with the key
stakeholders to fully understand their goals and propose metrics that are
aligned with these goals.
Metrics identification and measurement benefit greatly from strong
reporting skills, such the ability to create reports using standard tools
such as EXCEL, ACCESS, and OLAP tools such as Business Objects, Brio,
Crystal Report, and Cognos.
Finally, metrics expertise also includes an understanding of both the
potential and the limitations of data for constructing or deriving metrics.
Practitioners should seek for alternative data sources or metrics if the
existing sources have poor data quality.
Research-specific required skills
There are two basic types of marketing research: syndicated and customized research. Marketing research skills are often acquired through
training in the social science disciplines. Syndicated research expertise includes experience in effective acquisition of data from syndicated
research vendors, and management of vendor relationships and research
subscriptions. Customized research expertise consists of skills for designing and managing projects, survey research, focus groups, vendor selection, requests for proposal (RFP) process, and presentations of findings
and results. Industry and product knowledge is also an important
13
14
Data Mining and Market Intelligence
required attribute for customized research in that they allow for better
decisions over vendor selection and extraction of insight from studies.
Knowledge and experience in economics, which entail skills in collecting, analyzing, and interpreting market trends, economic climate data,
and economic impact on market opportunities, are valuable attributes for
both syndicated research and customized research.
Data mining-specific required skills
Practice of the data mining discipline is driven by two main skill sets: statistics and information technology. The required statistical skills include
the abilities to conduct exploratory analysis and to apply a broad range of
data mining techniques to solving marketing problems. The information
technology skills include expertise in database structure, data processing
and quality control, and data extraction skills such as Standard Query
Language (SQL).
Creating an effective engagement process
An engagement process needs to be in place to effectively manage and
enhance the research and analytic efforts. Without an engagement process, a research and analytic team passively takes on ad-hoc requests
where prioritization may be based solely on the particulars of the corporate pecking order without regard for the most relevant business
objectives. In such situations efforts may be driven and prioritized by
rationales other than project returns on investment.
The following list describes a step-by-step engagement process.
1. Choosing a point of contact person within a research and analytic team:
This individual should have an overall understanding of the capabilities of the team. Ideally, the point person should be a recognized project
manager who can effectively manage timelines, collect requirements,
transmit the requirements back to the team, and build relationships.
2. Determining the communication channels through which a research
and analytic team can be engaged: There are numerous ways in which
an engagement can take place, such as phone, e-mail, and the web.
Online request forms can be used to gather business requirements to
be followed up with face-to-face needs assessment meetings if needed.
The research and analytic management team can review incoming
requests on a regular basis. Written requests allow for systematic
documentation of requirements and customer needs.
3. Selecting the criteria, process, and frequency for project prioritization: Project prioritization involves ranking projects on the basis of
predetermined criteria and using the ranking to determine the order
for executing the projects. These criteria include project return on
Introduction
investment, incremental revenue, and incremental number of leads
generated. Key team members and stakeholders should be involved
in the project prioritization process by holding periodic discussion
and prioritization meetings. The prioritization frequency refers to
the frequency of holding discussion and prioritization meetings and
depends on the anticipated duration of projects. It is common to adopt
a monthly frequency of prioritization since overly frequent prioritization is not necessary and can disrupt work schedule.
4. Clearly communicating project delivery timelines and deliverables:
After the priority of a project is determined, the group point person
needs to communicate the project timeline and deliverables to those
who request the group’s service, and effectively manage their expectations throughout the project duration.
Promoting research and analytics
A research and analytical service to potential internal and external customers can be promoted in a number of ways. One approach is the distribution of a periodical e-newsletter to communicate the offerings, the
accomplishments, and future project pipeline of such a service to key
stakeholders. Another approach is the creation of a service web site with
the following key sections.
●
●
●
●
●
●
●
Home page
Who we are
Engagement process
Services
Case studies and success stories
Events
Contact us.
■ Book outline
The remaining chapters of the book are organized as follows.
Chapter 2: marketing spending models and
optimization
Chapter 2 introduces the concept of marketing spending modeling for
deriving an optimal overall marketing spending budget and effectively
allocating this budget across different product categories or marketing
15
16
Data Mining and Market Intelligence
communication channels. The chapter then gives a conceptual overview
on how to associate marketing returns with the financial performance of
a firm based on the modern portfolio theory.
Chapter 3: metrics overview
This chapter proposes a step-by-step procedure to guide metric selection
by first introducing the concept of a sales funnel and its five stages. It
then discusses the various types of metrics commonly used in marketing
such as return metrics, investment metrics, and operational metrics in the
context of a sales funnel. This chapter also gives an overview on the various marketing communication channels and how they are usually used
across the five key stages in a sales funnel.
Chapter 4: multi-channel campaign performance
reporting and optimization
This chapter discusses how to report and optimize the overall performance of marketing campaigns that utilize multiple communication channels. The performance reporting section examines the identification and
aggregation of common return metrics across multiple communication
channels. The performance optimization section of the chapter discusses
data mining on operational metrics to uncover the operational metrics
with the highest influence on return metrics.
Chapter 5: understanding the market through
marketing research
This chapter discusses creating a deeper understanding of the market through marketing research. Understanding of the market includes
knowledge and insights on the market opportunity and segmentation,
routes to market, and competitive landscape. This chapter also reviews
marketing research fundamental topics such as syndicated research, customized research, primary data, secondary data, survey design and sampling, focus group, and panel group.
Chapter 6: data and statistics overview
This chapter discusses data and statistical concepts that drive selection of
data mining techniques for solving marketing problems. Topics such as
data types, data distributions, and sampling methodologies are reviewed
in detail.
Introduction
Chapter 7: introduction to data mining
This chapter examines an array of widely utilized data mining techniques applied to marketing by providing a theoretical overview of each
technique and discussing specific examples for some of the techniques.
Standard data mining procedures such as data exploration, modeling,
validation, and testing are introduced. The following data mining techniques are covered.
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Association analysis
Analysis of variance (ANOVA) and multivariate analysis of variance
(MANOVA)
Canonical correlation analysis
Cluster analysis
Collaborative filtering
Conjoint analysis
Correspondence analysis
Decision tree
Discriminant analysis
Factor analysis
Logistic regression
Multi-dimensional scaling (MDS)
Principal component analysis (PCA)
Time series.
Chapter 8: audience segmentation
Chapter 8 presents four case studies on audience segmentation to illustrate the application of four data mining techniques: cluster, CART,
CHAID, and discriminant analysis. The four case studies are on behavior
and demographics segmentation, value segmentation, response behavior
segmentation, and customer satisfaction segmentation. Each case study
gives the background of a business problem, the data mining technique
applied to address the problem, the data mining model building and validation processes, and the marketing recommendations resulted from the
data mining analysis.
Chapter 9: data mining for customer acquisition,
retention, and growth
This chapter discusses three case studies on targeting, growth and
retention models to demonstrate the application of the logistic regression technique. Each case study examines the background of a business problem, the data mining technique used to solve the problem, the
17
18
Data Mining and Market Intelligence
data mining model building and model validation processes, and the
recommendations.
Chapter 10: data mining for cross-selling and
bundled marketing
This chapter discusses two case studies on e-commerce and targeted online
advertising promotions. In both case studies, the fundamental data mining
techniques for cross-sell and up-sell are applied to real marketing scenarios.
Chapter 11: web analytics
The chapter introduces the fundamentals of web analytics and its key metrics by business objectives such as lead generation and online e-commerce.
It also introduces syndicated research tailored for understanding web marketing trends and online customer behavior.
Chapter 12: search marketing analytics
This chapter discusses the principles of three search marketing disciplines: search engine optimization (SEO), search engine marketing (SEM),
and onsite search. The chapter also provides links to web resources on
subjects such as key words, domain, meta tags, and pay per click.
■ References
Aaker, D. Strategic Market Management, 7th ed. John Wiley & Sons, New York,
2005.
Bennett, P.D. Dictionary of Marketing Terms. American Marketing Association,
Chicago, Illinois, 1988.
Berry, M.J.A., and G.S. Linoff. Data Mining Techniques: For Marketing, Sales, and
Customer Support. John Wiley & Sons, New York, 1997.
Blankenship, A.B., and G.E. Breen. State of the Art Marketing Research. Chapter 12,
page 277. NTC Business Books, Lincolnwood, Illinois, 1995.
Doyle, P. Value-Based Marketing – Marketing Strategies for Corporate Growth and
Shareholder Value. John Wiley & Sons, New York, 2000.
Farris, P.T., N.T. Bendle, P.E. Pfeifer and D.J. Reibstein. Marketing Metrics: 50+
Metrics Every Executive Should Master. Wharton School Publishing, Upper
Saddle River, New Jersey, 2006.
Groth, R. Data Mining – Building Competitive Advantage. Prentice Hall PTR, Upper
Saddle River, New Jersey, 2000.
Lenskold, J.D. Marketing ROI – The Path to Campaign, Customer, and Corporate
Profitability. McGraw Hill, New York, 2003.
CHAPTER 2
Marketing
Spending
Models and
Optimization
This page intentionally left blank
Two of the most important questions in marketing refer to how much
should be spent and how should the budget be allocated. These questions
can be answered in more than one way, depending on the particulars of
the firm’s circumstances and the availability of data. In this chapter we
address these important issues in an econometric framework.
■ Marketing spending model
The primary objective of a marketing spending model is to establish a
relationship between marketing investments and marketing returns.
Marketing returns are the benefits that a firm receives when it invests in
marketing, such as sales value or number of product units sold. A properly devised marketing spending model also helps us understand the way
these variables interact, allowing us to gain deeper insights into what is
most effective at influencing marketing return.
Most of us are aware of potential diminishing returns of marketing spending. That is to say, as market spending increases, its incremental or marginal
impact eventually starts to decrease. This is just one of the many features
of the complex relationship between marketing spending and revenue. The
exact relationship between marketing return and marketing spending can
take on many different mathematical forms depending on factors such as
the type and frequency of data and the industry segment where the model
is applied. Typically, various functions need to be explored to derive the
best model and the best corresponding function form. In this book, we will
use the term ‘marketing spending model’ to name models that describe the
relationship between marketing spending and marketing return.
In addition to marketing spending as an independent variable impacting marketing returns, there are other potential independent variables
such as seasonality, product price, and the state of the economy.
The functional form that describes the relationship between marketing
spending and return, which may be quite general and complex in nature,
must above all lend itself to calibration in a stable and predictable way.
A comprehensive and detailed description of marketing spending model
mathematics is beyond the scope of this book. Here we give a brief summary of the issues involved to guide the reader into a more extensive
treatment of the subject, such as the excellent book by Hanssens, Parsons,
and Schultz, where an extensive list of valuable references can be found.
In some sources, the term ‘market response model’ is used in place of
‘marketing spending model’, used in this book. There are two reasons
for settling for the terms ‘marketing spending model’. One reason is that
it clearly conveys the message that such a model is related to marketing
spending. Another reason is that it avoids confusion with the terminology
22
Data Mining and Market Intelligence
used in targeting analysis, where ‘market response model’ is often used
to refer to ‘response targeting models’.
For any marketing spending model to be effective, it must be data based.
This reliance on data is what configures an empirical marketing spending
model (Hanssens, Parsons and Schultz 2001). Data enters in the formulation and calibration of a marketing spending model in two primary ways:
sequentially and cross-sectional. Sequential data comes in the form of time
series information, consisting of values at discrete points in time. Crosssectional data is data that describes values that occur at the same point in
time where these values can belong to time series. In general, when generating cross-sectional data, we deal with multidimensional time series – that
is, discrete and simultaneous information of several variables.
Although the primary framework in setting up and calibrating marketing spending models is data based, at the inception of a new marketing
plan the relevant data may not be available. In this case, the data-based
model is preceded by an initial growth stage, where parameters are set
based on the subjective judgment of experienced managers.
Our assumptions about markets as well as the level of detail that we
want to capture influence the modeling task. We may, for example, assume
that the market parameters are stationary, such as constant demographics
and employment level, or that market parameters are evolving very slowly
in comparison with our planning horizon. In such cases, our models will
be designed to respond appropriately in stationary markets. A different and
more complex situation arises if markets are evolving rapidly when compared with our planning horizon. In such a case, a different class of models
would be required to capture the intrinsic dynamics of the market.
If we consider the market to be stationary, there are two types of models we can postulate, each corresponding to a different level of analysis.
At a simpler level of analysis, we may assume that the sales and drivers adjust instantaneously as their level changes. This means that whatever functional relationships we formulate between our variables only
involve time to the extent that the variables change in time, not to the rate
of change of the variables in time. A situation like this reflects equilibrium
among variables and the types of models appropriate for this case are
referred to as static models.
From the perspective of time series data, static response models involve
the marketing investment variables evaluated at a single point in time.
Simple regression models fall in this category.
Within the assumption of stationary markets, at a more complex level
of analysis we may consider the time of adjustment among variables as
their levels change. This means that our model will capture not only the
time changes in the levels of the variables, but also the rates at which the
variables change in time. A model capable of capturing the noninstantaneous adjustment among variables is referred to as a dynamic model.
Marketing Spending Models and Optimization
Dynamic models involve the marketing investment variables evaluated
at multiple points in time. From a time series’ point of view, this implies
that the model will involve lags and leads. Simple auto-regressive models
are examples of this category.
In addition to the two time effects we have described so far – the
response to the level of variables as opposed to both the level and the
level changes of variables – there is yet another time effect imposed by
market fluctuations. To properly capture market fluctuations we must
formulate models in nonstationary or evolving markets. Models of this type
must be able to capture the nonstationarity of the statistical properties of
time series. Auto-regressive moving average models (ARMA) are examples of this type of models.
Here we will limit the discussion to stationary markets. This is the situation we face when our marketing planning horizon, usually on a quarterly basis, is relatively short compared with the time evolution scales of
economic or demographic effects.
Static models
A model can be very complex if the full interaction among variables is
taken into consideration. A simple static model is of the form
Q c0 c1 f (X )
(2.1)
where Q is the dependent variable of interest, such as sales volume, X is
the independent variable, in this case the marketing spending in the marketing plan, and c0 and c1 are coefficients to be estimated. Independent
variables are also called explanatory or predictive variables.
If function f(X) is linear, the model is referred to as a linear model.
Notice that this notion of linearity is not the same as the concept of linearity we encounter in estimation problems. In estimation, linearity refers to
the way in which coefficient c0 and c1 enter in the functional form of the
model. In estimation problems, the formulation is called linear if the estimation coefficients enter linearly, even if the dependent variables appear
in a nonlinear form. The important distinction in this regard is that as
long as the estimation coefficients appear linearly in the model they can
be estimated by linear regression.
Elasticity
An important concept to characterize a model is the notion of elasticity
(Hanssens, Parsons and Schultz 2001). Elasticity is the ratio of the relative
changes of the dependent and independent variables.
23
24
Data Mining and Market Intelligence
Q
Q X
Q
eX X
X Q
X
(2.2)
If there are several explanatory variables, the elasticity with respect
to one variable is computed keeping the other variables constant.
Mathematically, for infinitesimal changes in X and Q the elasticity can be
written in terms of the partial derivative
eX Q X
X Q
(2.3)
Simple linear model
In the simplest case of the linear model
Q c0 c1 X
(2.4)
c1 X
c0 c1 X
(2.5)
the elasticity is
eX A model of this form reflects the assumption that additional marketing
spending results in the same increment in the dependent variable, regardless of level. This situation is referred to as constant return to scale. It is
more realistic to assume a situation of diminishing returns, where additional marketing spending brings about increasingly smaller responses.
The constant elasticity model we analyze in the next section accomplishes
this objective.
Power models
An interesting model that accomplishes diminishing returns is known as
the constant elasticity model
Q aX b
(2.6)
where 0 b 1. This is a particular case of a power model.
The elasticity of this model is constant and equal to b, which gives intuitive meaning to this coefficient. However, the price we pay for having
constant elasticity is that the rate of change of Q for vanishing values of X
is infinitely large, as shown in Figure 2-1.
Marketing Spending Models and Optimization
Q
0
X
Figure 2-1
Sales volume as a function of marketing effort in the constant
elasticity model.
Another attractive property of this model is that we can estimate the
parameters with linear regression by working with the logarithms of both
sides of Eq. (2.6)
log Q log a b log X
(2.7)
The linear regression gives us log b, from which we can extract b.
The functional form in Eq. (2.6) can be generalized to the case of multiple independent variables in several ways. In the case of two variables,
we have
Q a1 X1b1 a2 X 2b2 a12 X1b12 X 2b21
(2.8)
This model captures the interaction of the independent variables in
the last term, but is no longer a constant elasticity model. We can maintain the desirable property of constant elasticity property by defining our
model as follows
Q a12 X1b12 X 2b21
(2.9)
Notice that the unrealistic behavior of infinitely rapid change of
Q for vanishing values of the independent variables persists in this
formulation.
25
26
Data Mining and Market Intelligence
S-shaped models
If we can argue that the nature of the return changes from an increasing
one to a decreasing one as a function of the independent variable, we can
consider an S-shaped curve. In an S-shaped model, there is a transition
from a convex to a concave return represented by an inflexion point. A
simple function that represents such a shape is the following exponential
model,
⎛
b⎞
Q exp ⎜⎜ a ⎟⎟⎟
⎜⎝
X⎠
(2.10)
where both a and b are positive. It is easy to see that the elasticity of this
model decreases with X
eX b
X
(2.11)
Figure 2-2 shows the overall shape of this functional form. The inflexion point is located at X b/2. The fact that this function starts out with a
zero slope means that there is no response to small marketing efforts.
Q
Inflexion point
0
X
Figure 2-2
Sales volume in the exponential model.
Modifications to the S-shaped model
Other possible modifications to the S-shaped model include imposing a
saturation level that reflects the fact that sales may not increase beyond
certain level of effort, or a sales floor to indicate that sales may still take
place in the total absence of any marketing effort. Functions of this type
Marketing Spending Models and Optimization
capable of describing general S-shapes are called sigmoid functions, of
which the well-known logistic function is a particular case. An example
of a nonsymmetrical one-dimensional logistic model that incorporates
both a sales floor QL and a saturation level QU is the following
Q
QL QU aX b
1 aX b
(2.12)
A plot of this function is shown in Figure 2-3. The function starts at
QL and asymptotes to QU as X grows. If QL and QU are postulated, the
parameters in Eq. (2.41) can be estimated using the logarithmic form of
the function. For example, for the case of two variables we have
log
Q QL
log a b1 log X1 b2 log X 2
QU Q
(2.13)
Saturation level
Q
Sales floor
X
Figure 2-3
One-dimensional logistic model with sales floor and saturation level.
Semilogarithmic model
A semilogarithmic function captures diminishing returns and as a result
is a widely used function form (Leeflang, Wittink, Wedel and Naert 2000).
In a semilogarithmic model, number of units of products sold Q and
marketing spending X follow the relationship.
Q 0 1 ln X (2.14)
27
28
Data Mining and Market Intelligence
A regression estimate of Q is:
Qˆ ˆ0 ˆ1 ln X
(2.15)
We now apply this functional form in an optimization framework
where we maximize profits with respect to marketing spending. For this
exercise, we define profit as gross profit adjusted by marketing spending.
Gross profit is the difference between revenue and the cost of producing
a product or providing a service without adjustment for items affected by
marketing expense, such as overhead or payroll. Consistent with this definition, profit is given by
P ( p c)Q X
(2.16)
where p and c are quantities independent of X representing the unit price
and the unit variable cost of goods sold, respectively.
To maximize profit, the following condition has to be met.
(( p c)Q X )
P
Q X
Q
( p c)
( p c)
1 0
X
X
X X
X
(2.17)
Based on Eq. (2.17) and replacing Q with its estimator, we get
⎛ ln X ⎞⎟
ˆ
1
Q
ˆ1 ⎜⎜⎜
⎟⎟ 1
⎝ X ⎠
pc
X
X
(2.18)
Therefore, the marketing spending that will optimize the profit of the
marketing effort is:
X ( p c)ˆ1
(2.19)
Marketing spending model case studies
Next we discuss two case studies illustrating the use of static models. The
first case study applies the formulation discussed in the previous section,
while the second expands the analysis to include the residual effect over
time of marketing expense.
Case study one Assume a company spent $5100 on a suboptimal marketing effort, sold 800 units of products, and realized a profit of $34,900.
We will apply a semilogarithmic model to the historical data of the firm to
determine the optimal marketing spending that would have maximized
the profit of the marketing effort, and then compare the optimal marketing spending with the amount that the firm actually spent.
Marketing Spending Models and Optimization
To derive the optimal marketing spending which maximizes the profit
of the marketing effort, we use the following parameters in Eqs. (2.15) and
(2.19) based on historical data: p $100, c $50, ˆ 20, ˆ 1 120. Given
0
these parameters, the optimal marketing spending is:
X (100 50) 120 6000
(2.20)
The estimated number of product units sold given the optimal marketing spending is:
Qˆ ˆ0 ˆ1 ln X 20 120 ln(6000) 1024
(2.21)
The maximum profit of the marketing effort is then:
P ( p c) Q̂ X (100 50) 1024 6000 45, 200
(2.22)
Based on this analysis, we conclude that the optimal marketing spending should be $6000, $900, or 18%, above the amount that the company
actually spent. If the company had spent $6000 on marketing, it would
have produced a profit of $45,200, 30% higher than the profit it actually
realized.
Case study two A dollar spent on marketing activities today drives not
only the sales today, but also sales in the future. The impact of advertising
on generating marketing returns has a residual effect over time. In the next
case study, we incorporate such residual effects in a time series (Leeflang,
Wittink, Wedel, and Naert 2000). We assume that the residual effect is t at
time t and that the simply compounded discount rate per unit time period
is i. The estimated profit given the optimal marketing spending is:
tn
t
X
t
t0 (1 i)
P ( p c)(ˆ0 ˆ1 ln X ) ∑
(2.23)
Using the sum of a geometric series, this expression can be simplified
as the following when n is large.
⎞⎟
⎛
⎜⎜
⎟⎟
1
⎜
⎟⎟ X
P ( p c)(ˆ0 ˆ1 ln X ) ⎜⎜
⎟⎟⎟
⎜⎜
⎟
⎜⎝ 1 1 i ⎟⎠
(2.24)
29
30
Data Mining and Market Intelligence
To maximize the profit of a marketing effort with respect to marketing
spending, the condition imposed by has to be satisfied:
⎞⎟
⎛
⎜
⎟⎟
ˆ1 ⎜⎜
P
1
⎟⎟ 1 0
⎜⎜
( p c)
X
X ⎜⎜ 1 ⎟⎟⎟
⎟
⎜⎝
1 i ⎟⎠
(2.25)
The optimal marketing spending is given by
⎞⎟
⎛
⎜⎜
⎟⎟
1
⎜
⎟⎟
X ( p c)ˆ 1 ⎜⎜
⎟⎟⎟
⎜⎜
⎟
⎜⎝ 1 1 i ⎟⎠
(2.26)
Assume a company spent $8000 on a suboptimal marketing effort, sold
1000 units of products, and realized a profit of $42,000. We will apply a
semilogarithmic model to the historical data of the firm to determine the
optimal marketing spending that would have maximized the profit of the
marketing effort, and then compare the optimal marketing spending with
the amount that the firm actually spent.
To derive the optimal marketing spending which maximizes the profit
of the marketing effort, assume now the following parameters: p $100,
c $50, ˆ0 20, ˆ 150, 20%, and i 3%.
1
Based on Eq. (2.26), the optimal marketing spending is
⎞⎟
⎛
⎜⎜
⎟⎟
1
⎜⎜
⎟⎟ 9307
X (100 50) 150 ⎜
0.2 ⎟⎟⎟
⎜⎜
⎟
⎜⎝ 1 1 0.03 ⎟⎠
(2.27)
Therefore, maximum profit of the marketing effort that can be achieved is
⎞⎟
⎛
⎜⎜
⎟⎟
1
⎜
⎟⎟ 9307 74 , 509
P (100 50)(20 150 ln(9307 )) ⎜⎜
0.2 ⎟⎟⎟
⎜⎜
⎟
⎜⎝ 1 1 0.03 ⎟⎠
(2.28)
Based on this analysis, we conclude that the optimal marketing spending is $9307. The company actually spent $8000; this means it underspent
by $1307, or 16%. If the company had spent $9307 on marketing, it would
Marketing Spending Models and Optimization
have produced a profit of $74,509, 77% higher than the actual profit
realized.
Optimal multi-channel marketing spending allocation
In this section we discuss a model that can be used in either multi-channel
or multi-product situation, where we evaluate the distribution of the total
marketing spending across multiple marketing communication channels
or products. This model is a modified version of a multi-product model
originally developed by Doyle and Saunders (Leeflang, Wittink, Wedel,
and Naert 2000).
Assume that there are n different marketing communication channels.
Let Qj denote the contribution of marketing communication channel j
to the number of units sold of product, and let Xj denote the company’s
marketing spending on marketing communication channel j.
Q j 0 j 1 j ln X j j
(2.29)
The regression estimate of the total number of units sold Q is given by
jn
Qˆ ∑ (ˆ 0 j ˆ1 j ln X j )
(2.30)
j1
The profit, which is gross profit subtracted by marketing spending, is
jn
jn
j1
j1
P ∑ cm j (ˆ0 j ˆ1 j ln X j ) ∑ X j
(2.31)
where cmj p – cj, and cj is the unit variable cost of goods sold through
marketing communication channel j. The optimal marketing spending of
marketing communication channel j results from
jn
∑ j1 (cm j ( ˆ 0 j ˆ1 j ln X j ) X j )
P
0
X j
X j
(2.32)
The solution for Xj, the optimal marketing spending of marketing channel j, is
X j cm j ˆ1 j
(2.33)
The total marketing spending is
jn
X ∑ cm j ˆ1 j
j1
(2.34)
31
32
Data Mining and Market Intelligence
Therefore, the fractional budget allocation to marketing communication
channel j is as follows.
Aj Xj
X
cm j ˆ1 j
jn
∑ j1 cm j ˆ1 j
(2.35)
Optimal marketing spending allocation by product
Assume there are m different marketing products. Let Qk denote the
contribution of the company’s marketing spending on product k to the
number of units sold of product k and let Xk denote the company’s marketing spending on product k.
Qk 0 k 1k ln(X k ) k
(2.36)
The estimate of Qk is given by:
Qˆ k ˆ 0 j ˆ 1k ln(X k )
(2.37)
The profit of the marketing effort – the estimated gross profit adjusted
for marketing spending is:
P
km
∑ cmk (ˆ 0 k ˆ1k
k1
ln X k ) X k
(2.38)
where cmk pk – ck and pk is the unit price for product k.
The optimal budget for marketing product k must satisfy the following
condition
km
∑ k1 cmk ( ˆ 0 k ˆ 1k ln X k ) X k
P
0
Xk
Xk
(2.39)
Therefore, the optimal marketing spending for product k is
X k cmk ˆ1k
(2.40)
The total marketing spending is
X
km
∑ cmk ˆ1k
k1
(2.41)
Marketing Spending Models and Optimization
The fractional budget allocation to product k is given by
Ak Xk
X
cmk ˆ1k
km
∑ k1
cmk ˆ 1k
(2.42)
Environmental changes and seasonality
By environmental changes we mean situations where a driver of marketing return changes suddenly – or, more precisely, fast when compared
with the marketing horizon – from one level to another. For example, a
news report that suddenly exposes a product in a markedly more favorable or negative light will suddenly change the environment where
the marketing effort is being conducted. The occurrence of such sudden
change is a categorical rather than a numerical event.
In the context of linear regressions, it is straightforward to incorporate
these sudden changes through the introduction of a dummy numerical
variable, Z, which takes on the value zero before the change happens, and
takes on the value one after the change happens.
Equation (2.4) is modified as follows
Qt c0 d0 Z c1 Xt d1 ZXt
(2.43)
where the subscript t makes explicit the fact that observations at period
t may correspond to different values of the dummy variable. The coefficients are determined through regression. This formulation allows us to
use the tools of linear regression to interpret the parameters in the model
and to assess their confidence intervals.
This idea can be easily extended to handle multiple environmental
changes. A particular case of environmental change is seasonality, where
each season represents a distinct and sudden change in market conditions.
Since there are four seasons, if we take one of them as a reference we need
only three dummy variables to represent the changes due to the remaining seasons. Assume the reference season is indicated by the index 1,
we can extend Eq. (2.43) to handle seasonality as follows.
Qt c0 d02 Z2 d03 Z3 d04 Z4 c1 Xt d12 Z2 Xt d13 Z3 Xt d14 Z4 Xt
(2.44)
The dummy variable Zi is one within season i and zero elsewhere. The
dummy variable Z1 does not appear in this expression because index 1
identifies the reference season.
It is possible to extend the same idea to the case of multiple independent variables and to the nonlinear functional forms.
Before leaving this section on static models, we must emphasize that
although in principle we can accommodate a large number of variables
33
34
Data Mining and Market Intelligence
and changes in environmental conditions, the number of variables we can
handle is determined by the statistical properties of the estimated parameters. This is typically a data issue. Unless sufficient and reliable data is
available, estimation will lead to parameters affected by significant error.
In such cases, the functional complexity of a model may be overwhelmed
by the estimation error.
Dynamic models
The objective of a dynamic model is to capture the adjustment time
between dependent and independent variables. Notice that this adjustment time is a separate concept from the fact that the market parameters
themselves may be changing in time.
Capturing the time of adjustment between dependent and independent variables imposes relationships between the variable levels and their
rates of change. This is a problem that can be formulated in terms of differential equations, or in terms of discrete values in time series. We focus
on the latter, since this is established practice in marketing analytics.
The dynamic adjustment between dependent and independent variables may respond to the dissemination of marketing information, to the
anticipation of such information, or to a combination of both. Framing
the problem of dynamic response in terms of dissemination of marketing information leads to the consideration of lags in a time series, taking
into account anticipation would result in the inclusion of leads in the time
series analysis. Since the most common situation in practice is the delayed
impact of a marketing effort, we focus on the former.
Simple lag model
To formulate the simplest lag model, we reconsider Eq. (2.4) to represent
the situation where the effect of a marketing effort is felt k periods of time
after the marketing effort is implemented. The following modification of
Eq. (2.4) reflects this fact
Qt c0 ck Xtk
(2.45)
This means that the return Qt, which we assume to be a constant value
during period t, is the result of the effort X implemented k periods earlier.
This formula can be generalized to K lag periods
Qt c0 kK
∑ ck Xtk
(2.46)
k1
As stated, this representation may present us with challenging calibration
problems. We can get around the calibration issue by imposing additional
Marketing Spending Models and Optimization
structure to the right-hand side of Eq. (2.46), by making assumptions
about the coefficients, ck.
Geometrically distributed lag model
In this particularly popular model, we assume that the impact of marketing spending on marketing performance decreases geometrically as the
number of periods increases. We can formulate this model in terms of a
parameter which we can interpret as the fraction that a current marketing effort has on marketing return in future periods. This parameter is
called retention rate and its value is typically around 0.5. The geometrically distributed lag model is as follows
k
Qt c0 c(1 ) ∑ k Xtk
(2.47)
k0
Here, c0, c, and are parameters to be estimated. Simple manipulations
of Eq. (2.47) give the following expression (Clarke 1976).
Qt (1 )c0 c1 Xt Qt1 (ut ut1 )
(2.48)
where c1(1)c and ut is an error term added to Eq. (2.47). We get a
form suitable for regression estimation by setting wt utut1 in
Eq. (2.48).
In the geometrically distributed model, the short-term effect of the marketing effort is given by c1 and a fraction of the long-term impact of the
marketing effort takes effect over log(1 )/log periods.
The estimated values of parameters in discrete time will depend on the
frequency of the time series. When repeated estimations are performed
for comparative purposes, it is important to keep the data frequency constant between estimations. This issue is referred to in the literature under
the concept of data interval bias (Leone 1995 and Clarke 1976).
A number of other formulations along these lines are possible. For
more details, the reader is referred to the references (Hanssens, Parsons
and Schultz 2001 and Koyck 1954).
■ Marketing spending models and
corporate finance
Here we discuss some ideas of how marketing spending models could be
developed in the context of corporate financial objectives. It is commonly
35
36
Data Mining and Market Intelligence
argued that the primary goal of a firm is to maximize shareholder value.
This would suggest that the ultimate objective of a marketing plan is
to maximize the equity value of the firm that undertakes the marketing plan.
Before discussing the possible interactions between marketing effort
and shareholder value, we must be precise about the meaning of maximizing shareholder value. We will assume for the moment that the assets
of the firm can be neatly divided between debt and equity, where equity
holders absorb the vast majority of the firm’s risk and are therefore
expected to be rewarded with the highest returns. In reality, the capital
structure of firms is much more complex than a clear partition of equity
and debt, with components that share both equity and debt-like features
(such as convertible bonds and preferred stock).
Investors elect to invest in the equity of a particular firm because the
equity of that particular firm exhibits a profile of risk versus return that
investors like. Modern portfolio theory tells us that in the long term, a
change in expected equity returns will go hand in hand with changes in
the risk profile of the equity.
The risk profile of the equity of a firm results from the combination of
the market fluctuations of the shares of the firm, and the correlation or
those fluctuations with rest of the market. The task of senior managers is
to position the firm in such a way that its long-term equity growth is as
high and stable as possible consistent with the risk profile of its industry.
This suggests that shareholders will benefit from the marketing effort
indirectly, to the extent that the objectives of senior managers, which are
aided by the marketing effort, are consistent with the interest of the shareholders. Next we examine a proposed framework for integrating a marketing effort with shareholder objectives.
A framework for corporate performance
marketing effort integration
Earlier in this chapter, we discussed a way to capture environmental
changes in the evolution of a time series, where the time series represents a measure of marketing performance, such as sales volume. We can
extend this idea to capture the effect of the implementation of a marketing plan on the statistical properties of equity returns. In our case, the
time series we observe are equity returns, and the sudden environmental change we wish to capture and quantify is the onset of the marketing
plan.
What precisely is the statistical property of equity returns that we aim
to enhance through the marketing investment? We address this question by invoking modern portfolio theory (MPT). The theory tells us that
Marketing Spending Models and Optimization
the fair or risk-adjusted returns on an investment in a particular asset,
Rasset, the return on a short term risk-free security, Rf, and the returns
on the overall market, Rmarket are related by the following expression
(Luenberger 1996).
E[Rasset ] R f asset (E[Rmarket ] R f )
(2.49)
where asset, known as the ‘beta’ of that particular asset, is defined as
follows
asset cov(Rasset , Rmarket )
var Rmarket
(2.50)
In the light of MPT, we can posit that the purpose of the marketing
effort is to produce asset returns above the risk-free rate that exceed, or at
least maintain, the fair returns predicted by Eq. (2.49).
We start out by assuming that the current long-term returns of the
firm at least adjust according to Eq. (2.49). To see whether the marketing
spending does indeed have the desired effect, we can conduct the analysis we described in Section 2.1, where Qt is now interpreted as the change
in realized return of the firm’s equity over period t.
Rt Rt1 c0 d0 Z c1 Xt d1 ZXt
(2.51)
A linear regression analysis of this representation tells us whether
the marketing effort is driving returns above the risk-adjusted levels of
Eq. (2.51), or toward this level in case the equity is under-performing.
■ References
Clarke, D.G. Econometric measurements of the duration of advertising effect on
sales. Journal of Marketing Research, Chicago, Illinois, 13: 345–357, 1976.
Hanssens, D.M., L.J. Parsons, and R.L. Schultz. Market Response Models,
Econometric Time Series Analysis, 2nd ed. Kluwer, Massachusetts, 2001.
Koyc, L.M. Distributed Lags and Investment Analysis. Amsterdam, North-Holland,
1954.
Leeflang, P.S.H., D.R. Wittink, M. Wedel, and P.A. Naert. Building Models for
Marketing Decisions. Kluwer, Massachusetts, 2000.
Leone, R.P. Generalizing what is known about temporal aggregation and advertising carryover. Marketing Science, Hanover, Maryland, 14(3), 1995.
Luenberger, D.G. Investment Science. Oxford University Press, New York, 1996.
37
This page intentionally left blank
CHAPTER 3
Metrics
Overview
This page intentionally left blank
In this chapter, we discuss the key metrics used for measuring and optimizing return on marketing investment. As we alluded to earlier, using
the wrong metrics may have serious consequences on marketing returns,
and may in fact drive a company out of business. Consider the following
scenario.
John, marketing director at Sigma Corporation, tracks 75 metrics to measure the impact of his company website in generating online sales. He has
two dedicated full-time staff members compiling reports on these metrics
on a daily basis. In addition, he has a full-time web developer focusing on
changing website features to increase web traffic. Tom is the marketing director of Huber Sigma Corporation, the main competitor of Sigma. On average,
Tom tracks one return (online sales revenue) metric. In addition, he tracks
10 operational metrics about his company website. Over the past year, for
the same product category, Sigma’s online sales dropped 40%, while Huber
Sigma’s doubled. What could have possibly gone wrong with Sigma?
The answer is that Tom focused on few relevant key return and operational metrics, while John pursued every metric available without making
any differentiation among them. In addition, it was suboptimal for John
to focus so much energy on web traffic alone in light of his objective of
generating online sales, since web traffic is really an operational metric,
not a return success metric. This story also shows that focus on the wrong
metrics not only costs business, but also drains resources unnecessarily.
The following is a list of key questions and issues addressed in this
chapter.
1. What are the common metrics used to measure returns and
investments?
2. What is the most appropriate formula for return on investment?
3. What are the common challenges of tracking returns on investments?
4. What is the process of identifying appropriate metrics?
5. What are the stages in a sales cycle (a.k.a. sales funnel)?
6. What are the common marketing communication channels?
7. What are the key metrics in each stage of the sales cycle?
8. What is the difference between return metrics and operational metrics? How do we use both types of metrics to drive future campaign
optimization?
9. What are the tips on addressing common ROI tracking challenges?
■ Common metrics for measuring
returns and investments
In order to properly measure marketing returns on investment, we need
to identify appropriate return metrics and investment metrics. We start
out with a discussion on return metrics.
42
Data Mining and Market Intelligence
Measuring returns with return metrics
Essentially, a return metric is a variable that can be measured and used to
quantify the desired end result of a marketing effort aimed at migrating a
target audience from stage to stage in the sales cycle. If the purpose of a
marketing effort is to move a target audience from being leads to becoming buyers, then return metrics measure and quantify buyers and their
purchases. In this case, the number of buyers, the number of units sold,
the total purchase amount, and the average purchase amount are examples of return metrics.
Return metrics can be expressed in either financial or nonfinancial
terms. Financial return metrics are metrics measured in dollars. Realized
revenue is an example of financial return metrics. Nonfinancial return
metrics are metrics that are not measured in dollars, but which nevertheless are important indicators of short-term marketing success. Examples
of nonfinancial return metrics are incremental increase in awareness,
number of responders to a marketing promotion, number of leads (potential buyers), and increase in customer satisfaction. In business-to-business
situations, where the sales cycle is usually long, nonfinancial return metrics can be particularly important if the amount of investment in driving
this type of short-term return is significant. Take as an example the advertising market. The value of the overall advertising market in 2006 in the
US was $292 billion (Mediaweek 2006). Most of the advertising dollars are
spent in driving the awareness or perception of a brand, product, and service rather than to generate revenue in the short run. If we only account
for financial returns for advertising in the short run, then the short-term
returns of advertising may be negligible. In reality, advertising can drive
financial returns in the long run. Another reason for measuring short-term
returns of marketing efforts is that marketing planning and execution
sometimes have a shorter time frame than the sales cycle and are run and
optimized ongoingly. As a result, the effectiveness of a campaign needs to
be determined before potential customers make any purchases.
Measuring investment with investment metrics
Investment metrics refer to investment costs, such as marketing spending.
These costs can be either fixed or variable. Fixed costs occur regardless
of the implementation of a particular marketing campaign. Examples of
such costs include the costs of existing marketing staff members and the
upkeep of office facilities that are spread across marketing and other corporate functions, such as sales, operations, and information technology.
Fixed costs are not allocated to specific marketing programs. One marketing program manager might be responsible for an e-mail program for
product A and a direct mail program for product B simultaneously, and it
Metrics Overview
can be a challenge to estimate how much time this individual spends on a
particular marketing program. Other examples of fixed costs are building
lease cost and information technology support costs.
Variable costs are costs that can be accurately attributed to specific marketing programs. Such costs include media costs, agency costs, production costs, and postage for a specific marketing program.
One frequently asked question is which cost category should be
included when accounting for marketing investment costs. If fixed costs
can be attributed to specific marketing efforts without making assumptions that will lead to significant errors in cost allocation, then the correct thing to do is to consider both fixed and variable costs in estimating
marketing investment. However, in situations where fixed costs cannot be
accurately attributed to specific marketing efforts, the correct thing to do
is to steer away from fixed costs and consider variable costs only. If fixed
and variable costs are both imputed, then they must be applied to all
marketing programs. If only variable cost is used, then variable cost must
be applied to all marketing efforts that the firm undertakes. Consistency
is an important consideration when deciding on the application of fixed
and variable costs.
■ Developing a formula for return
on investment
Return on investment (ROI) is defined as the following ratio.
ROI Return Investment
Investment
(3.1)
The numerator is the difference between return and investment, and
the denominator is investment. Investment is therefore given by
Investment Costs of Goods Sold Costs of Capital
Marketing I nvestment Additional Investment (3.2)
In practice, in addition to immediately realized revenue amounts, usually there are potential future revenue streams. The best way to accurately quantify this type of financial returns is to compute the sum of the
net present values of these streams of revenue. This is called the lifetime
value (LTV) of a campaign and should be factored in as the total return of
a marketing campaign whenever possible.
43
44
Data Mining and Market Intelligence
in
LTV NPV(sum of future revenue streams) ∑
i0
Ri
(1 di )i
(3.3)
where Ri is the revenue at the end of time period i, di is the discount rate
at time period i, and n is the total number of time periods in a lifetime.
Caution needs be exercised when accounting for LTV for multiple marketing campaigns. If several marketing campaigns target the same audience during the same time period and they all account for the LTV of the
same audience as returns, then we have a double counting issue. In these
circumstances, it is best to report the ROI at an aggregate level across
the multiple campaigns by aggregating both the incremental return and
the incremental marketing investment.
The following formula incorporates the LTV concept into the formulation of a marketing effort profit (Leeflang, Wittink, Wedel, and Naert
2000). Note that we also need to compute the NPV of LTV for costs of
goods sold (COG), cost of capital, marketing expense, and additional
marketing expenses.
in
Profit ∑
i0
in
in
in
Ri
Di
Mi
Ai
∑
∑
∑
i
i
i
(1 di )
(
1
)
(
1
)
(
1
d
d
di )i
i
i
i0
i0
i0
(3.4)
where Ci is the cost of goods sold at time period i, Di is the cost of goods
sold at time period i, Mi is the marketing expenses at time period i, Ai is
the additional expenses at time period i, and di is the discounted rate at
time period i.
■ Common ROI tracking challenges
It is a well-known fact in the marketing community that ROI is crucial but
often difficult to track. The following are some common reasons behind
the challenge of tracking ROI.
●
●
●
●
●
●
●
●
No clearly defined business objectives and corresponding metrics
Confusion over what a ‘true’ return is
Confusion over true return metrics versus other variables such as
operational metrics
No access to prospect, customer, or sales data
No system or process for integrating multiple data sources
Information overflow: too much data and too little insight or
intelligence
Data quality issues: data is not clean and reliable
Unable to quantify marketing contribution to a sale transaction
Metrics Overview
●
●
●
●
●
Unable to attribute sales to a particular channel such as offline sales
due to online marketing
Long sales cycle hindering proper control of environmental factors
and effective tracking
No cost efficiency threshold to determine when to continue or stop
spending
No prioritization of metrics in terms of their importance
Inability to quantify marketing impact on the company bottom line:
marketing often perceived as a cost center.
The list of challenges can easily be extended. On a positive note, there
is growing interest and determination in the marketing community to
overcome these challenges wherever they occur.
Through this book, we will review means and tools that can help
address some of the challenges listed above.
■ Process for identifying appropriate
metrics
Figure 3-1 shows as step-by-step approach for identifying ROI metrics. This approach ensures that the selected metrics are well aligned
with the business objectives and are able to track returns on investments
effectively.
Step 1
Step 2
Identification
of the overall
business
objective
Understanding
the impact of a
marketing effort
on target
audience
migration
Step 3
Selection of
appropriate
marketing
communication
channels
Step 4
Step 5
Identification of
appropriate
return metrics
by stage in the
sales cycle
Construction of
ROI metrics
with return
metrics and
investment
cost
Figure 3-1
Step-by-step process for metrics identification.
Identification of the overall business objective
A business objective is a desired outcome of a marketing effort. The following is a list of common business objectives.
45
46
Data Mining and Market Intelligence
●
●
●
●
●
●
●
●
●
●
Increasing brand or product awareness in the target audience
Educating the target audience
Generating interest in particular products or services
Generating leads
Acquiring new customers
Minimizing customer attrition or increasing customer loyalty
Increasing revenue from existing customers by selling them additional
products (cross-sell or up-sell)
Increasing profitability
Increasing customer satisfaction, renewals, or referrals
Increasing market share and penetration.
Correct identification of the business objectives is crucial to the selection of appropriate metrics to effectively measure the success of marketing investment. A very common mistake in marketing is the misalignment
between business objectives and metrics tracked. For instance, it often
happens that the number of leads is tracked in the context of a brand
awareness program. Brand awareness programs are meant to increase the
awareness level among the audience, not to generate leads. Leads may be
generated as a by-product of a brand program, but should not be used as
the sole metric to judge the success of the program.
Understanding the impact of a marketing
effort on target audience migration
After the business objectives have been determined, we must identify the
target audience and where it is in the sales cycle. In general, there are five
stages in a sales cycle (or sales funnel). They are awareness, interest and
relevance, consideration, purchase, and loyalty and referral (Figure 3-2).
The awareness stage
At this stage, prospects are exposed to information about companies,
products, or services. This information could be a review of what they
already know or completely new information. At this stage, we don’t
expect prospects to immediately make purchases. However, their understanding of a company, a product, or service deepens at this stage and
their likelihood of making a purchase later increases.
There are different types of awareness, such as awareness of a brand,
product, or service. We will use the Prozac.com website as an example,
illustrated in Figure 3-3.
The website provides information about depression as a disease and
Prozac as a medicine. The ‘Disease Information’ section serves as a highlevel education source for depression as a disease. The ‘How PROZAC
Metrics Overview
Sales cycle
Audience
Awareness
Prospects
Website visitors
Inquirers
Responders
Interest and relevance
Leads
Consideration
Purchase
Customers
High-value customers
Satisfied customers
Advocates
Loyalty and
referral
Figure 3-2
A five-stage sales cycle.
Home
How Can Prozac Help
Disease Information
Next Steps
Welcome to Prozac.com
Prozac is the most widely prescribed
antidepressant medication in history.
Since its introduction in 1986, Prozac
has helped over 54 million patients
worldwide, including thouse suffering
from depression, obsessive compulsive
disorder, bulimia nervosa, and
panic disorder.
Figure 3-3
Depression and Prozac awareness
(Source: Prozac.com Website 2006).
Can Help’ section gives an overview of Prozac and how it can help
depression patients. Both sections help raise awareness about depression
and about Prozac as one of the drugs for mitigating depression. These two
sections by themselves may not get readers to purchase Prozac right away,
however. What is expected is that once depression patients build enough
47
48
Data Mining and Market Intelligence
awareness and knowledge about the disease and the drug, they will be
interested in making an inquiry at their doctor ’s offices or through other
channels.
Our primary objective at the awareness stage is to accomplish an incremental degree of brand perception by the target audience. Increase in
awareness, usually measured through survey studies, is a common return
metric at the awareness stage.
The interest and relevance stage
At this stage, prospects may exhibit interest after their awareness (of a
brand, product, or service) has been brought to a certain level. They may
feel that the product or service is relevant to their needs and preferences
and they may respond by requesting more information or filling out a
survey.
We review the Prozac.com example again. Some web visitors will
proceed to the ‘Next Steps’ section once they build enough awareness
and interest in Prozac. There are five suggested actions under ‘Next
Steps’. They are ‘Asking your Doctor ’, ‘Balanced Living’, ‘Support
Organizations’, ‘Caring for Others’, and ‘Request More Information’.
While ‘Balanced Living’, ‘Support Organizations’, and ‘Care for Others’
are sections intended to further educate visitors, the other two sections,
‘Asking you Doctor ’ and ‘Request More Information’ require some sort
of ‘action’ on a visitor ’s part. When a visitor takes an action, it means that
he has passed the awareness stage and has moved on to the ‘Interest and
Relevance’ stage.
Any metric that quantifies interest, such as number of responders, is a
suitable return metric for this stage.
The consideration stage
At this stage, customers or prospects exhibit sufficient interest to consider
a purchase. They are willing to engage with sales or customer service
teams in a dialogue about their needs and potential purchases. In the consideration stage, the audience is willing to invest more time in interactions with marketers than in the interest and relevance stage.
In the Prozac.com example, sales leads can be generated through different scenarios. The most common scenario is through a doctor ’s prescription since usually a doctor will prescribe a drug that suits a patient’s
physical and mental conditions. A patient that has gone through the
awareness stage and the interest and relevance stage will likely discuss
the potential use of Prozac with his doctor.
Any metric that quantifies consideration, such as the number of qualified leads, are appropriate return metric for this stage.
Metrics Overview
The purchase stage
By the time a prospect reaches this stage, he has got a clear need and
has gathered sufficient information about a certain product or service.
These prospects are likely to have gone through a test trial with the product and are getting close to making an outright purchase. They are convinced that the product or service can address their needs and that they
can afford it.
In the Prozac example, after a patient has gone through the awareness
stage, the interest and relevance stage, and the consideration stage, he is
ready to make a purchase. The action of a purchase characterizes the purchase stage.
At the purchase stage, we need to quantify buyers and purchases
among the target audience. Common return metrics at this stage include
number of buyers, number of transactions, total sales, and average sales
per transaction.
The loyalty and referral stage
In this stage, customers are very satisfied with a product or service and
can be viewed as loyal customers. They begin to spread positive ‘word
of mouth’ (WOM) and actively refer others to the product or service. The
importance of WOM cannot be overemphasized. WOM is particularly
effective when large transactions and large investments on the part of the
purchaser are involved. Customers at this stage are the most loyal ones
and should be treated with extreme care. In fact, there is a new discipline
of marketing called social marketing that capitalizes on customer referrals.
Examples of return metrics at this stage are customer satisfaction level,
customer tenure, total historical purchase amount, the number of repeat
purchases, the number of referrals, and revenue generated as a result of
referrals.
So far, we have introduced the process of metrics identification, we
described the typical sales cycle and the target audience engaged at each
stage of this sales cycle. Now we focus on marketing communication channels that are best suited for each stage of the sales cycle. To some extent,
selection of marketing communication channels drives metric selection.
Selection of appropriate marketing
communication channels
After identifying where to migrate the audience within the sales cycle, we
need to leverage the most effective marketing communication channels
to accomplish the migration. This section is designed to provide an overview on how various marketing communications channels are utilized.
49
50
Data Mining and Market Intelligence
Marketing communication channels are classified as online or offline
channels. Table 3-1 classifies the commonly used marketing communication channels by their online or offline nature.
Table 3-1 Key marketing communication channels (offline vs. online.)
Online (Internet)
Offline
Banner
Search
TV
Print (e.g., newspaper/FSI,
magazine)
Radio
billboard
Physical store
Direct mail (e.g., catalog,
postcard, letter, newsletter)
Telemarketing
Trade show
Seminar
Online community
Website
E-mail (including electronic
newsletter)
Webinar
Another way to classify marketing communication channels is by
whether they are mainly used for broad reach or targeting. Table 3-2 shows
a classification of key marketing communication channels in this manner.
Table 3-2 Key marketing communication channels (broad reach vs.
targeting.)
Broad reach
Targeting
Banner
Direct mail (e.g., catalog,
postcard, letter, newsletter)
Telemarketing
Trade show
Seminar
E-mail (including electronic
newsletter)
Webinar
Search
Online community
Website
TV
Print (e.g., newspaper/FSI, magazine)
Radio
Billboard
Physical store
As a general rule of thumb, we use broad reach channels to communicate to a broad audience in the initial stages of the sales cycle, such as
Metrics Overview
awareness, and apply for targeting channels to target specific individuals
in later stages. Table 3-3 shows how marketing channels are often used
across the five stages in the sales cycle. Please note that some marketing
communication channels may be used for both broad reach or targeting.
Table 3-3 Marketing communication channels by stage in sales cycle
Awareness
Interest
and
relevance
Consideration
Purchase
Loyalty and
referral
TV
Print
Radio
Billboard
Direct mail
TV
Print
Radio
Billboard
Direct
mail
E-mail
Online
banner
Search
Website
TV
Print
Radio
Billboard
Direct mail
Direct mail
E-mail
Search
Website
Telemarketing
Direct mail
E-mail
Search
Website
Telemarketing
E-mail
Online banner
Physical store
Physical store
Community
E-mail
Online
banner
Search
Website
Search
Website
Telemarketing
Physical store
In what follows we give a summary overview of market communication channels.
Broadcast channels
TV, radio, billboards, newspaper, and magazines are often used to reach
a broad audience to generate awareness. However, there are ways to
target a more specific type of audience. In the case of newspapers and
magazines, those who read the Wall Street Journal have a different demographic profile than those who read other papers. The Wall Street Journal
caters to affluent professionals. Another example is the ‘Parent’ magazine,
which appeals to an audience with children.
Online advertising
This group of media caters to either a broad or a more targeted audience.
For example, the home page of Yahoo.com attracts a broad audience while
the various sections of the site attract more targeted audiences. Visitors to
the Yahoo Finance site tend to be more interested in finance and investment,
while those visiting Yahoo Travel are interested in travel. Next we provide
an overview of online advertising (advertorial.org 2006 and iab.com 2008).
51
52
Data Mining and Market Intelligence
The content of online advertising can be text, standard graphs (GIF,
flash), or rich media. A rich media ad is a web ad that uses advanced
technologies such as a streaming video and an applet. Online ads also
have many different formats in terms of style and size.
Sponsored text links are one of the latest trends in online advertising.
Although less flashy than rich media, text links are often perceived as
content rather than advertising.
The word ‘advertorial’ is a combination of two words, advertising and
editorial. It refers to an advertisement written in the form of an editorial
to give an appearance of objectivity.
A full banner (468 60) is the classic format (468 pixel in width and
60 pixel in height) and is usually residing at the top of a web page. Even
though newer, smaller formats are being utilized, this banner format is
still delivering some of the best results. The sheer size of this format gives
it the ability to attract more attention.
In 2001, a group of new size formats were introduced to allow for a
more flexible integration of online ads into web content. There are four
common rectangular formats and one common square format.
●
●
●
●
●
Rectangle: 180 150 pixels
Medium Rectangle: 300 250 pixels
Large Rectangle: 336 280 pixels
Vertical Rectangle: 240 400 pixels
Square: 250 250 pixels
A leaderboard is a popular format that was originally used in sports.
A leaderboard usually sits between the title area of a web page and its
content. The standard size for a leaderboard is 728 90 pixels and can
consist of text and animation.
Some newer formats have been developed to utilize the extra space of
a web page or to make web pages more exciting. These newer formats
include skyscraper, interstitial, superstitial, floating ad, pop-up, popunder, pop-up, and rollover.
A skyscraper is an economic way of using web space. Contrary to the
traditional banners that use horizontal space, a skyscraper uses vertical space. The standard formats of a skyscraper are 120 600 pixels and
160 600 pixels and the latter is called a wide skyscraper.
An interstitial ad is an ad that appears on a website when a visitor
clicks a link to a content page on a site. The visitor will first see the interstitial ad before seeing the requested page. Interstitial ads need to be used
very carefully as visitors may find this type of ads intrusive.
A superstitial is an interactive (and sometimes entertaining) online ad
with a flexible size. The first superstitial was designed for the Superbowl
event in 2000. Superstitials can have animation, sound, and even video
elements. It has the look and feel of a television ad. Like interstitial
Metrics Overview
ads, superstitial ads are activated when a visitor goes from one page to
another.
A floating ad is an online advertising format that is superimposed on
web page that a visitor requests. It is usually triggered either when rolling
over an ad or when the content page loads. It usually disappears automatically after 5–30 seconds.
A pop-up is a small window that is initiated by a single or a double
mouse click. The small window usually sits on one area of the web page
that a visitor is viewing. A visitor can get rid of pop-up by closing the
small window. A pop-up is often considered intrusive and should be used
with caution. There are many pop-up blockers on the market now that
block activation of pop-up ads.
The pop-under ad is one of the latest innovations in online advertising.
Unlike pop-ups that block part of the content on a web page, a pop-under
is small window that appears under the main window of a site. In general, pop-under ads are considered less intrusive than pop-up ads.
A rollover ad is an interesting format that allows marketers to maximize use of the web real estate. Graphs or messages are displayed on the
same banner whenever the visitors rest the mouse for a moment over the
banner, or when they click on the banner.
Search engine marketing
Search engine marketing (SEM) as a marketing communication channel
has gained significant traction lately. SEM has shown measurable impact
on marketing in generating responders, leads, and buyers. Those who
search for certain key words are actively seeking information on a particular subject matter and therefore are already in the interest and relevance
stage of the sales cycle. There are two types of listings: natural listings
(same as organic or editorial listings) and paid listings.
Natural listings are free. Search engines such as Yahoo and Google
send out ‘crawlers’ or ‘robots’ to comb various websites and pages over
the Internet and record relevant web pages in an index. If the content of
a web page is relevant to a particular topic, then the web page will be
indexed under that topic. When someone searches for that topic, this particular web page will be displayed, along with other web pages, as the
search results return.
In contrast, paid listing requires advertisers to pay a fee to search
engine companies. Search engine marketing allows an advertiser to promote his product or service by displaying the product or service description (usually called an ad copy) and a link as part of the search result
listings. Advertisers bid on key words for their products or services to
appear in prominent positions in search results. Copies of their advertising on product and services are listed based on their ranking in the bid.
53
54
Data Mining and Market Intelligence
The higher an advertiser is willing to bid on a keyword, the better his
ranking is likely to be. A top ranking ensures that an ad copy will appear
at the top of a paid listing section. Advertisers are charged only when
someone clicks on the paid listing and they pay by the number of clicks.
Advertisers do not pay if there is no click on their listings.
In the example illustrated in Figures 3-4 where a search for ‘digital camera’ is submitted, the listings appearing under ‘Sponsored Links’ on the
top and on the right hand are paid listings. In this example, Dell has the
top one position for the key word ‘digital camera’ in the sponsored links
section on the top of the web page. Listings that are not under ‘Sponsor
Links’ are natural listings.
Figure 3-4
Search results for ‘digital camera’.
Corporate websites
Corporate websites have become an important marketing communication channel. A well-designed and implemented company website can
serve multiple purposes across the sales cycle. In the awareness stage,
Metrics Overview
a company’s website can be used to build brand awareness among its
visitors and educate them on the company products and services. In the
interest and relevance stage, the site can be used as a venue for visitors
to register for more information and opt-in for newsletters. Onsite search
functions provide visitors with additional convenience in locating the
products that they are interested in and therefore moves them further
into the sales cycle. In the consideration stage, web registrants and opt-in
members can be furthered screened for qualified leads and potential buyers. In the purchase stage, the site can serve as an e-commerce marketplace where buyers purchase directly from the marketer. Websites with
a look-up feature for account history offer additional convenience for
buyers to order the same products repeatedly or to review their past purchases and such a feature can further convert buyers to repeat buyers. In
the referral stage, a website can offer blogs, online communities, forums,
or chat rooms to solicit feedback and stimulate WOM marketing. There
is an emerging trend of turning a company website into an activity ‘hub’
that serves customers wherever they are in the sales cycle. According to
a 2006 report (Webtrends 2006), 56% of the CMOs surveyed were using
or planning to use, within one year, their company websites as a hub of
marketing strategy for building relationships.
Direct mail
Direct mail is one of the first media designed for one-to-one direct marketing. Direct mail addresses particular individuals or organizations.
Direct mail format varies from postcard, letter, to catalog. Usually there
is a clear call to action for direct mail recipients. For example, a business
reply card (BRC) may be enclosed for a recipient to fill out and return
via a business reply envelop (BRE). Sometimes, a web URL or a toll-free
number is given in direct mail for a recipient to visit a website or make
an inquiry call. Those who fill out a BRC, visit a website, or call a toll-free
number are called responders.
E-mail
Like direct e-mail, e-mail is an excellent medium for one-to-one marketing, soliciting responses and generating leads. Compared with direct
mail, e-mail is less formal but can be delivered faster and more cheaply.
Newsletters
Newsletters can be used in multiple stages in the sales cycle. They can
be used to educate prospects and raise their awareness of a product or to
generate responses, leads, or sales. Furthermore, newsletters can also be
used to cross-sell or up-sell additional products to existing customers. In
55
56
Data Mining and Market Intelligence
terms of format, newsletters can be in either print or electronic form. Print
newsletters addressed to specific individuals or organizations are a form
of direct mail. Electronic newsletters addressed to specific individuals or
organizations are usually in e-mail format.
Telemarketing
Telemarketing is another medium for one-to-one marketing beyond the
awareness stage. This medium is usually more expensive than direct mail
or e-mail. However, telemarketing allows human interaction and intervention in the process while e-mail and direct mail cannot. As a result,
telemarketing can be more effective and is widely used as a lead qualification and follow-up tool.
Physical stores
Physical stores are currently where most sales transactions take place
and this is particularly true of high-value and high-consideration products. Stores serve audiences across the various stages in a sales funnel
and are places where prospects, potential leads, and existing customers
congregate.
Tradeshows and seminars
Tradeshows and seminars are two marketing media mainly used for
lead generation purposes. Occasionally, marketers use tradeshows and
seminars to educate prospects on complicated products or services as an
awareness generation mechanism. However, the use of tradeshows and
seminars for awareness generation is usually expensive.
Webinars
Webinars are electronic seminars that have recently gained in popularity.
Webinars are cost-effective and can serve multiple purposes in the sales
cycle. They can be used to educate prospects, solicit responses, or generate leads. Webinars also allow for real-time interaction and questions
and answers (Q&A) and as a result, makes the engagement process more
interesting and effective.
Identification of appropriate return metrics by
stage in the sales cycle
There are five groups of return metrics, in alignment with the five stages
of the sales cycle.
Metrics Overview
Return metrics at the awareness stage
This group of return metrics measures awareness of brands, products,
or services. Some of these metrics, such as number of recalls, are direct
measures of awareness. Indirect (proxy) measures of awareness, such as
number of impressions or reaches, are used under the assumption that
they will ultimately lead to awareness buildup. Under some situations,
it is difficult or expensive to directly measure the level of awareness, so
indirect measures are used as an alternative to direct measures.
For corporations with good name recognition, such as Cisco, Google,
Microsoft, and Coca-Cola, brand equity value can be one additional
measure for brand awareness. Brand equity value is the monetary value
of a brand. Table 3-4 shows a list of the top twenty corporate brands.
Table 3-5 shows common return metrics at the awareness stage.
Table 3-4 Top twenty corporate brands in 2005 (source:
Businessweek 2005)
Ranking
Brand
Equity value
(in million)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Coca-Cola
Microsoft
IBM
GE
Intel
Nokia
Disney
McDonald’s
Toyota
Marlboro
Mercedes
Citi
Hewlett-Packard
American Express
Gillette
BMW
Cisco
Louis Vuitton
Honda
Samsung
$67,525
$59,941
$53,376
$46,996
$35,588
$26,452
$26,441
$26,014
$24,837
$21,189
$20,006
$19,967
$18,866
$18,559
$17,534
$17,126
$16,592
$16,077
$15,788
$14,956
Return metrics at the interest and relevance stage
Genuine interest from prospects emerges when their awareness level
reaches a critical point. The target audience at this stage is engaged
57
58
Data Mining and Market Intelligence
Table 3-5 Common return metrics at the awareness stage
Direct return metrics at
awareness stage
Proxy return metrics at
awareness stage
Number of recalls
Number of media mentions
Awareness increase measured by survey
Number of impressions
Number of reaches
Brand value/equity
and responsive to marketing solicitations, searching for a specific topic,
requesting information, or responding to marketing campaigns. Return
metrics at the interest and relevance stage measure the extent to which
the audience is engaged or responsive to marketing stimulation. Here is a
list of common return metrics at the interest and relevance stage.
●
●
●
●
●
●
●
Number of website visitors
Number of unique website visitors
Number of new website visitors
Number of repeat website visitors
Number of website page views
Number of clicks on a particular link
Number of responses to website offers.
In defining these metrics, care must be taken that the quantities used
properly reflect the dimensions involved in the marketing program. For
instance, the number of website visitors may refer to a particular location
or a specific period of time.
The next example shows the use of website visitor data to measure
return of a website at the interest and relevance stage. Company A
launched a website featuring its new product at the end of Sep. 2005. The
main purpose of the site was to increase interest in the new products. The
company tracked the number of unique visitors as the return metric with
a web analytic tool. Table 3-6 shows a steady growth in the number of
unique visitors to the site.
Return metrics at the consideration stage
This group of metrics measures the effectiveness of marketing programs
in generating leads. Leads are defined as those with have sufficient
awareness and interest in a particular product or service to contemplate
making a purchase. The number of leads is a common return metric at
this stage.
Metrics Overview
Table 3-6 Unique visitors to company A’s website
Number of unique
visitors in October
2005
Number of unique
visitors in November
2005
Number of unique
visitors in December
2005
85,006
110,193
134,500
Return metrics at the purchase stage
This group of metrics includes metrics such as the number of transactions, sales revenue, and average purchase amount per transaction (a.k.a.
AOV, average order value). Here are some common return metrics at the
purchase stage.
●
●
●
●
Number of transactions
Number of buyers
Total revenue
Average value per transaction.
The next example illustrates the return metrics at the purchase stage. A
cell phone company A launched an e-mail program to target 50,000 existing customers to persuade them to renew their phone service plans at an
annual subscription fee of $600. The e-mails had a link to a website where
customers could renew their subscriptions online. Five hundred of those
targeted by the e-mail renewed their phone plans online, resulting in
direct sales revenue of $300,000. The investment cost of the program was
$40,000. The net profit was $260,000. Company A achieved $300,000 direct
sales return with an investment of $40,000. We summarize all the above
statistics in Table 3-7.
Table 3-7 Return metrics of company A’s email program
Return metrics at purchase stage
Value
Number of transactions
Number of buyers
Total revenue
Average value per transaction
500
500
$300,000
$600
Company A was not able to capture offline sales as a result of the e-mail
program. Offline sales occurred in situations where customers received
the e-mail but decided to call to renew instead of doing so online.
59
60
Data Mining and Market Intelligence
Return metrics at the loyalty and referral stage
This group of metrics measures the depth of the relationship between a
marketer and its customers. Some examples of this type of metrics are
customer retention rate, length of customer tenure, the number of purchases per year, and customer lifetime value. Loyal customers can refer
people to the brands that they are familiar with and as a result, generate
potential future business for those brands. Given the popularity of blogs
and other types of online communities, loyalty and referral metrics are
expected to exert increasing influence on purchase behavior.
Here is a list of common return metrics at the loyalty and referral
stage.
●
●
●
●
●
●
●
Life time value (LTV)
Purchase frequency
Tenure (length of time since becoming a customer)
Number of referrals
Revenue due to referrals
Customer testimonials
Customer satisfaction.
There are other metrics, called proxy metrics, that measure loyalty or
referral indirectly. Customer satisfaction is an example of a proxy metric used under the assumption that satisfied customers are more loyal.
Customer satisfaction is a very important metric in major corporations,
and is often used as one of the criteria for measuring marketing executives’ performance and compensation. The most common method for
acquiring customer satisfaction data is by survey analysis. Next we discuss an example illustrating customer satisfaction survey analysis.
Company A, a consumer electronics manufacturer, recently revamped
its website to enhance visitor experience. New features of the site include
a store locator page with information on the nearest store locations and
store phone numbers, a shopping cart for online direct purchases, and
pages of promotion offers. The company ran an online survey prior and
after the new site launch to gauge how customer satisfaction may have
changed due to the new site features. Satisfaction was measured from a
rating of one, not satisfied at all, to nine, extremely satisfied. The survey
result shows that the average visitor satisfaction increased from 5.2 to 6.4,
a 23% improvement with an 80% statistical confidence level.
One frequently asked question is how one can tie customer satisfaction improvement to incremental revenue. Surveys are usually conducted
in an anonymous fashion and their results cannot be easily mapped to
the revenue-generating customer base. We need to measure custom satisfaction and revenue consistently in the same audience and apply data
Metrics Overview
mining analysis to determine the level of correlation between customer
satisfaction and revenue.
■ Differentiating return metrics from
operational metrics
So far, we have identified an appropriate ROI formula and the key return
metrics across the five stages in the sales cycle. However, it is still a common challenge to distill the enormous amount of available marketing
data. This challenge arises primarily from the difficulty in differentiating
return metrics from operational metrics. The key difference between these
two types of metrics is that the former indicates an end result while the
latter focuses on a process. Operational metrics track the footprints of an
audience as they migrate from one stage to the next or within the same
stage in a sales cycle. The majority of the metrics are in fact operational
metrics. To better illustrate the difference between return and operational
metrics, we will go through some exercises to identify appropriate return
and operation metrics in the sales cycle.
If the desired end result is to move the audience from the awareness
stage to the interest and relevance stage and the marketing communication channel is an online banner for generating clicks on ads, then the
return metric should be the number of clicks. The operational metrics are
those metrics that measure how effectively impressions turn into clicks
and the click-through rate is an example of operational metrics. It is a
common mistake to treat the click-through rate as the return metric.
If the desired end result is to move the audience from the interest
and relevance stage to the consideration stage, and the marketing communication channel is direct mail for generating leads, then the appropriate return metric is the number of leads. The operational metrics are
those that measure how effectively responses turn into leads, such as the
response to lead conversion rate.
If the desired end result is to move the audience from the consideration
stage to the purchase stage and the marketing communication channel
is outbound phone followup for generating sales, then the appropriate
return metrics are the number of buyers, revenue amount, and profit. The
operational metrics are those metrics that measure how effectively leads
turn into buyers, such as the lead to buyer conversion rate.
A best practice on campaign reporting is to clearly show the distinction
between return metrics and operational metrics. In a campaign performance report, return metrics need to be placed in more prominent positions than operational metrics.
61
62
Data Mining and Market Intelligence
■ References
Leeflang, P.S.H., D.R. Wittink, M. Wedel, and P.A. Naert. Building Models for
Marketing Decisions. Kluver, Massachusetts, 2000.
Marketer ’s Guide to Media. Mediaweek, New York, 2006.
The advertorial.org website. Montreal, Canada (http://www.advertoroal.org).
Webtrends CMO Web-Smart Report. Webtrends, Portland, OR, 2006.
The Interactive Advertising Bureau website (http://www.iab.net), New York,
2008.
CHAPTER 4
Multichannel
Campaign
Performance
Reporting
and
Optimization
This page intentionally left blank
In this chapter, we focus on the tracking and analysis of marketing returns
from multi-channel campaigns. Marketers often use more than one single
communication channel in a campaign due to the fact that different communication channels tend to appeal to different segments within a target
audience. In the high-tech business-to-business market place, for instance,
marketers often use direct mail to target business decision-makers (BDM)
and online channels to target technical decision-makers (TDM).
■ Multi-channel campaign
performance reporting
It is a constant challenge to report campaign performance on multiple
communication channels. The challenge is augmented by lack of clarity
about the different roles that different metrics play. Return metrics can
be the same across different channels but operational metrics are often
channel-specific.
Figure 4-1 illustrates a systematic thought process for identifying
appropriate metrics for multi-channel reporting.
Step 1
Step 2
Step 3
Step 4
Step 5
Identify all
marketing
communication
channels and
their associated
cost and target
volume
Identify the
overall return or
success metrics.
Aggregate and
track these
metrics across
channels
Select
marketing
channel
specific return
or success
metrics
Identify
operational
metrics by
marketing
channel
Uncover
operational
metrics highly
associated with
channel return
Figure 4-1
Metrics identification process for multi-channel campaign reporting.
The first step is to identify all the marketing communication channels
in the campaign and to track their associated cost and target volume.
Consider the following example: Company A launched a multi-channel
campaign with the objective of generating leads to move its target audience to the consideration stage of the sales cycle. The campaign consisted
of four channels: direct mail, e-mail, online banners on external web sites,
and paid search. The cost and the target volume of the four channels are
detailed in Table 4-1.
66
Data Mining and Market Intelligence
Table 4-1 Cost and target volume of a multi-channel campaign of
company A
Market
communication
channel
Total marketing cost
(agency and media costs)
Target volume
Direct mail
E-mail
Online banners
Paid search
Total
$500,000
$300,000
$250,000
$750,000
$1,800,000
1,000,000 pieces mailed
400,000 e-mails delivered
125,000,000 impressions
500,000 clicks
–
The second step is to identify the overall return or success metrics of
the campaign, to aggregate these metrics across all marketing communication channels, and to track them. In the example of company A, the
overall return or success metrics is the number of leads generated by the
campaign. However, some channels might have purposes beyond driving the number of leads. For instance, online banners are used to raise
awareness as well as to generate leads. We can roll up the number of the
leads from online banners and other channels to derive the total number
of leads for the campaign, but it is important to remember the additional
purpose of each channel. At this stage, it is important to calculate the
overall returns and returns by channel. Channels with a higher returns
should be invested in more heavily. When the returns is measured in nonfinancial terms such as number of leads, then the cost per lead is the metric to optimize. Table 4-2 shows the key return or success metrics for the
company A campaign.
Table 4-2 The rollup of return metrics
Marketing
communication
channel
Total marketing cost
(agency and media costs)
Leads
Cost per lead
Direct mail
E-mail
Online banner
Paid search
Total
$500,000
$300,000
$250,000
$750,000
$1,800,000
5000
9000
1250
10,000
25,250
$100
$33
$200
$75
$71
The third step in selecting appropriate metrics is to identify the channelspecific return or success metrics. These channel-specific return or success
Campaign Performance Reporting and Optimization
metrics can be viewed as mini goals or intermediate return or success metrics. In the example of company A, the ultimate return or success metric
of the campaign is the number of leads. Within the online banner channel, the number of responses and the number of clicks can be considered
as intermediate return or success metrics for this particular channel.
The fourth step is to identify all potential operational metrics by marketing communication channel. Operational metrics are usually channel-specific and may not be rolled up across channels. For example,
click-through rate is an operational metric that only applies to online
channels and cannot be applied to direct mail. Since there may be hundreds of operational metrics, it is important to take the additional step of
identifying those with the highest impact on the returns of the channel.
The fifth and last step in the identification process is to uncover operational metrics highly associated with channel returns. This is where data
mining is extremely useful and should be fully leveraged. Appropriate
adjustments in the values of operational metrics within each channel can
maximize the overall returns of the campaign. For example, in the case of
online banners, the number of impressions, click-through rate, and response
rate are potential operational metrics. By improving any of these three operational metrics, company A can increase the number of leads generated by
its online banners and therefore increase the number of leads generated by
the overall campaign. Table 4-3 through Table 4-6 show the common operational metrics for direct mail, e-mail, online banner, and search marketing
from the awareness stage to the conversion stage of the sales cycle.
Table 4-3 Common operational metrics for direct mail from the awareness stage to conversion stage
Marketing
communication
channel
From awareness to interest
and relevance
From interest and
relevance to conversion
Direct mail
Response rate(responses/
mail quantity)
Lead conversion rate
(leads/responses)
■ Multi-channel campaign
performance optimization
The purpose of campaign performance optimization is to maximize the
cost-effectiveness level of a return metric by leveraging what can be
67
68
Data Mining and Market Intelligence
Table 4-4 Common operational metrics for e-mail from the awareness
stage to conversion stage
Marketing
communication
channel
From awareness to interest
and relevance
From interest and
relevance to conversion
E-mail
Open rate(e-mails opened/
e-mails delivered)
Lead conversion
rate(leads/responses)
Click-through rate(clicks
on links in e-mails/e-mails
delivered)
Response rate(responses to
an offer/e-mails delivered)
Table 4-5 Common operational metrics for online banner from the
awareness stage to conversion stage
Marketing
communication
channel
From awareness to
interest and relevance
From interest
and relevance to
conversion
Online banner
Click-through
rate(clicks on banner/
impressions)
Lead conversion
rate(leads/responses)
Response rate
(responses to an offer/
clicks on banner)
learned from the past. Campaign optimization can be viewed as an extension of campaign reporting. Additional steps to optimize future campaigns
need to be taken after the five-step campaign performance reporting process introduced in the previous section is completed, as shown in Figure 4-2.
The first step is to identify the metrics to optimize. Optimization metrics should be aligned with the overall campaign return or success metrics.
The rationale is apparent: performance optimization aims to cost effectively increase the value of the return or success metrics. In the example
of company A, the overall return or success metric is the number of leads.
Company A can keep spending money to generate more leads but at
some point it will no longer be cost-effective to do so given the potential
Campaign Performance Reporting and Optimization
Table 4-6 Common operational metrics for paid search from the awareness stage to conversion stage
Marketing
communication
channel
From awareness to interest
and relevance
From interest and
relevance to conversion
Paid search
Click-through rate(Clicks on
Webpage links/impressions)
Lead conversion
rate(leads/responses)
Response rate(Responses to
an offer/clicks on Webpage
links)
Step 1
Step 2
Step 3
Step 4
Step 5
Identify metrics
for optimization
Determine
optimization
timeframe,
frequency, and
tool
Identify key
operational
metrics with
highest impact
on the
optimization
metrics
Identify factors
that influence
key operational
metrics values
Apply
learning to
future
campaign
planning and
execution
Figure 4-2
Campaign optimization process.
diminishing returns. The solution is to maximize the number of leads
with a cost per lead threshold in place. How can company A derive
this cost efficiency threshold? One way is to derive the expected average profit per lead and make sure that the cost will never run above the
expected profit.
The second step is to determine the optimization time frame and the
tool required for optimization. How frequently an optimization strategy
is revisited depends on the marketing communication channel utilized.
Sufficient time needs to pass before any conclusions about the strategy
are drawn. The time frame and frequency of optimization should be
close to the time frame required for a marketing communication channel
to achieve its full result. For example, results on metrics such as clicks,
click-through rate, and responses for banner, search, and e-mail can
usually be measured close to real time with appropriate online analytic
tools. In this case, campaign performance can be tracked in real time and
69
70
Data Mining and Market Intelligence
optimized accordingly. Marketing dollars can be shifted from underperforming media sites to over-performing ones. In contrast, direct mail
requires a much longer time period (one month or more) to generate final
response results.
In recent years, there has been significant advancement in analytic and
optimization tools, including management and optimization tools for the
web, ad serving, search, campaign management, lead and sales tracking,
as well as customer relationship management (CRM). Before implementing full-scale deployment of any analytic and optimization tools, it is
important to test the tool through a business proof of concept pilot with
the tool vendors.
The third step in campaign optimization is to analyze the data so far collected, and to identify key operational metrics based on their impact on
the optimization metrics. In the example of company A, it is clear that the
company needs to optimize the number of leads acquired cost effectively.
To accomplish this, we need to identify the most important contributors to
the number of leads. Response to lead conversion and number of responses
are examples of these contributors. An increase in response to lead conversion rate at a given cost and a given number of responses will result in an
increase in the number of leads. Alternatively, an increase in the number of
responses at a given cost and a given response to lead conversion rate can
result in an increase in the number of leads. Discovering these influential
factors is crucial for optimizing future campaign performance.
There are occasions when it is not obvious which factors are influential. When this is the case, we can use data mining techniques to uncover
hidden relationships. Data mining techniques such as Classification and
Regression Tree (CART) can be used to analyze the relationship between
potential influential factors and the number of leads acquired at a given
cost. Logistic regression is another data mining technique that can be
used to build models to target those who are more likely to convert to
leads. Chapter 7 discusses various data mining techniques that can be
leveraged to uncover hidden relationships.
The fourth step in campaign optimization is to identify attributes that
can be manipulated to influence the values of key operational metrics.
In the previous example of an online banner, marketing messaging is an
attribute that may be changed to drive responses to lead conversion rate
and hence the number of leads.
Where can we find potential influential attributes? Data pertaining to
any of the following areas can be candidates for such factors and the following is a list of such factors:
●
●
Target-audience characteristics such as lifestyle and social economic
status
Stages in the sales cycle such as awareness stage and conversion stage
Campaign Performance Reporting and Optimization
●
●
●
Attributes of marketing communication such as creative and
messaging
Marketing and sales operations such as customer services and
fulfillment
Features of marketing campaigns such as rebates and discounts.
In the fifth step, the learning from the current campaign should be applied to future campaigns to optimize marketing planning and execution.
Ongoing tests and learning environments can lead to optimal marketing
efforts and to sustained high returns on marketing investment.
Uncovering revenue-driving factors
Revenue is often the return or success metric that most marketing
executives choose to focus on. Given the importance of revenue, we will
go over some common practices on how to best identify revenue-driving
factors.
The key to understanding revenue-driving factors is to understand the
target audience and where they are in the sales cycle. Customer segmentation is the tool for understanding the target audience. There are numerous ways of segmenting and profiling a customer base. The following
are some common practices for uncovering revenue opportunities by
segmentation.
●
●
Segmentation of existing customers by value: The value of a customer
is defined as the revenue generated by the customer over a period of
time. Potential future revenue opportunities can be better identified as
a result of this type of value segmentation. Marketing dollars need to
be allocated to those customer segments with high growth potential to
maximize revenue. In addition, cross-sell and up-sell can be leveraged
to increase revenue. In Chapter 7, we will discuss a common cross-sell
and up-sell analytic technique called association analysis (market basket
analysis).
Segmentation of the target audience by share of wallet: Share of wallet
is defined as the total spending on a brand over the total spending on
the category that the brand is under. Consider the supermarket business as an example. Customers who spend most of their grocery dollars with a particular supermarket brand (let us call it supermarket A)
are the primary customers of this supermarket brand. In other words,
supermarket A owns a large share of wallet of these customers. Those
customers who spend most of their grocery dollars at competitors’ stores
but also shop at this supermarket brand are its secondary customers
with a small share of wallet. Supermarket A can increase its revenue
by either increasing the share of wallet of its secondary customers, or
increasing the purchase amounts of its primary customers.
71
72
Data Mining and Market Intelligence
●
●
Segmentation of the target audience by likelihood to buy: We need
to know where the various audiences are in the sales cycle. Do they
barely know the brand and products? Do they know the products
so well that they are ready to make purchases? Based on the insight
gleaned from segmentation, different types of marketing programs
can be created to target different subsegments in the audiences. For
example, awareness programs are used to educate those at the awareness stage. Lead generation programs are used to target those who are
ready to purchase. Data mining techniques such as logistic regression
can be leveraged for targeting marketing. These techniques will be
discussed in detail in Chapter 7.
Segmentation of the target audience by needs: Marketing the right
products to the right customers increases returns of marketing programs.
Progams targeting specific audiences with specific products are more
effective than generic programs.
In summary, success in multiple-channel marketing campaigns requires
consistent focus on the appropriate metrics and analysis of the interrelationships between these metrics. Reporting and optimization, two
seemingly tactical areas of marketing, can often drive important marketing
investment strategies.
CHAPTER 5
Understanding
the Market
through
Marketing
Research
This page intentionally left blank
Marketing research is a powerful vehicle for uncovering and assessing
market opportunities. In particular, it is an effective tool for addressing
the following three sets of questions to ensure effective marketing investment planning.
●
●
●
Where is the market opportunity? What is the size and growth rate of
the opportunity?
Who is the target audience? What are their profiles and characteristics?
Why do consumers or businesses choose one product over another?
Why do they choose one brand over another?
In this chapter, we give an overview of marketing research and its
applications to enhancing marketing returns. We start out with a synopsis on the application of marketing research to understanding the market
and then will discuss marketing research as a discipline.
■ Market opportunities
Understanding potential market opportunities is the first step in marketing investment planning. Solid knowledge of the market structure and
market opportunities minimizes risk and increases returns on marketing investment. A market opportunity can be described by the following
parameters.
●
●
●
Market size
Market growth
Market share.
Market size
One way to describe and quantify a market opportunity is through
market size information. Market size information can be segmented by
attributes such as geography or industry. Syndicated research companies,
such as IDC, Gartner, Forrester, Hitwise, Nielsen Media Research, Jupiter
Research, and comScore, provide market size information for standard products. For nonstandard products or new products, customized
research is required for gathering market size information. Customized
research is usually more expensive than syndicated research. The following is an example of market size syndicated research and data.
In the last few years, search marketing has grown rapidly as a marketing and advertising vehicle. Online search marketers are always interested in their key word rankings (positions) with key search engines.
Based on a comScore report (comScore Networks, 2006), the total number
76
Data Mining and Market Intelligence
of searches in the US grew from 4.95 billion in January 2005 to 5.48 billion in January 2006 at a growth rate of 10.7%. Table 5-1 shows that as
of January 2006, Google had the largest market share (41.4%), followed
by Yahoo (28.7%), MSN (13.7%), Time Warner Network (9.6%), and Ask
Jeeves (5.6%). This market share information is important for determining
where to invest to maximize exposure to potential customers. The search
engines with the largest search market size are usually the most attractive
marketing and advertising partners for search marketers.
Table 5-1 Total Internet searches and share of online searches by search
engine (Source: comScore 2006)
Total Internet searches
Share of searches by engine
Google sites
Yahoo! sites
MSN–Microsoft sites
Time Warner network
Ask Jeeves
Searches Jan. 2005
(billion)
Searches Jan. 2006
(billion)
4.95
Jan. 2005 (%)
35.10
31.80
16.00
9.60
5.10
5.48
Jan. 2006 (%)
41.40
28.70
13.70
7.90
5.60
Market size terminology
We must differentiate between total available market and total addressable market. Syndicated research companies often provide market size
information on the total available market. Total addressable market is a
subset of the total available market. Due to a variety of factors, companies
may only have access to a subset of the total available market. This subset
is called the addressable market. For example, a company with no infrastructure in Asia cannot sell into this geographic portion of the market.
Therefore, for this particular company, the total addressable market is the
total available market subtracted by the Asian market.
Factors that impact market-opportunity
dynamics
Many factors can impact market opportunity and its growth. Understanding these factors allows for better marketing planning and more
Understanding the Market through Marketing Research
effective buy-in from marketing executives. The most important factors
are macroeconomic trends, emerging technologies, and customer needs.
Impact of macroeconomic factors on market
opportunities
A large number of elements within the macroeconomy affect market
dynamics. We explore the most significant ones: Gross Domestic Product
(GDP) growth, geopolitical factors, oil prices, exchange rates, interest
rates, unemployment rates, product life cycle, and corporate profits.
●
●
GDP growth: The growth of GDP not only is an indicator of market
growth but also affects confidence in the market place and can drive
subsequent growth. In other words, GDP growth is both a reflection and a potential driver of future market growth. According to the
Bureau of Economic Analysis (2006, http://www.bea.gov/glossary/
glossary.cfm), the GDP of a country is the market value of goods and
services produced by labor and property in that country, regardless of
the nationality of the labor and of those who own the property. The
Gross National Product (GNP) of a country is the market value of
goods and services produced by labor and property supplied by the
residents of that country, regardless of where they are located. GDP
replaced GNP as the primary measure of US production in 1991. GDP
is a composite measure based on various types of goods and services. Since GDP is a composite of growths in the various sectors of the
economy, the growth of the larger economic sectors, such as manufacturing, financial services and government spending, tend to have
more influence on the overall GDP growth. Consider the so-called
nonresidential equipment and software investment sectors. Figure 5-1
shows that GDP was highly correlated with the nonresidential equipment and software investment sectors from Q1, 2005 to Q1, 2006. It is
also noticeable that the nonresidential equipment and software investment sectors tend to have wider swings than GDP. This is likely due to
the after-shock effects of GDP data releases, suggesting that a boost in
GDP itself tends to boost confidence in the market place and thereby
tends to indirectly boost subsequent investment in the two sectors.
Political uncertainty: Political uncertainty, like economic uncertainty,
tends to trigger or hold back business investments and consumer
spending. The war in Iraq and the kidnappings of foreigners in the
Middle East, for instance, have made investors think twice about their
involvement in rebuilding the region. The threat of terrorism (including cyber terrorism) has boosted US government spending on defense
and security since 2001, as illustrated in Figure 5-2.
77
Data Mining and Market Intelligence
16
GDP
14
Nonresidential
equipment and
software
Growth (%)
12
10
8
6
4
2
0
2005:Q1 2005:Q2 2005:Q3 2005:Q4 2006:Q1 2006:Q1
Quarter
Figure 5-1
GDP versus nonresidential equipment and software investment.
Source: The Bureau of Economic Analysis, 2007.
12.0
GDP
10.0
Federal national defense
investment
8.0
Growth (%)
78
6.0
4.0
2.0
0.0
2001
2002
Year
Figure 5-2
US GDP versus federal national defense investment.
Source: The Bureau of Economic Analysis, 2007.
2003
Understanding the Market through Marketing Research
●
●
●
●
●
●
Oil prices: Oil prices usually fluctuate with the political climate in oilexporting areas such as the Middle East, Latin America, and Africa. The
Organization of Petroleum Exporting Countries (OPEC) often adjusts
its oil production level based on geopolitical factors. When oil prices go
up, costs of production for goods go up and investment is scaled back.
Exchange rate: The exchange rate has an effect on imports and exports,
which in turn affect GDP growth. A weaker currency benefits exports
and GDP if all the other factors are kept constant. A stronger currency
inhibits exports and can result in increased inflationary pressures.
Interest rate: Higher interest rates usually have a deterring effect on
capital and consumer spending as borrowing costs increase.
Unemployment rate: The employment rate is a reflection of corporate
spending and hiring. An increase in unemployment is an indication
of weak confidence in the market place and a decrement in the rate of
business expansion.
Product life cycle: Product life cycle is another factor that influences
markets opportunities. When a product is approaching its end of lifetime, the market tends not to invest in this product and as a result its
market size shrinks. Customers tend to avoid investing in a product
about to become obsolete, and often prefer to wait for the next generation of the product, whose market size may then grow over time.
Corporate profits: Increase in corporate profits usually has a positive
impact on corporate spending in the long run, if not in the short run.
Corporations are ready to spend more when executives feel comfortable with the state of business in their firms. Investment banks and
research firms regularly survey CEOs, CFOs, and CIOs to gauge their
feelings about the economic climate.
All of the factors discussed above have either positive or negative
impacts on market growth. Therefore, paying close attention to these factors is extremely important. A number of marketing research companies
designate analysts and experts to analyze these factors on a regular basis
and compile market size forecast based on these factors.
Impact of emerging technologies and customer needs on the
market
Technology breakthroughs often have an impact on market growth,
although the growth may be initially small as investors wait for full-scale
adoption of the new technology.
It is important to track emerging technologies or products that may
replace existing technologies or products (and eventually eliminate current markets), supplement an existing technology or products (and
thereby impact a current market either positively or negatively), or create
79
80
Data Mining and Market Intelligence
completely new markets. For example, adoption of radio frequency identification (RFID) in the retail market has driven a demand for this new
technology. At one point, Wal-Mart, the largest US retailer, requested
some of its suppliers to become RFID-compliant by 2005. This created a
sizeable market for RFID products and services.
The creation of the credit card market, which is now a large financial
market, is the result of customers’ demand for convenience. Diner ’s Club
introduced the first credit card in the 1950s. American Express and Bank
of America started issuing their cards in 1958. Over the years, the credit
card became indispensable to most consumers and businesses, and as a
result, a new and large financial market emerged. The size of credit card
receivables in 2001 was over $600 billion in the US.
Market growth trends
Industry or technology analysts often express market growth for the following years (usually a total of five years) with a metric called compound
annual growth rate (CAGR). The standard formula for computing CAGR
is as follows:
⎛X
CAGR ⎜⎜⎜ e
⎜⎝ X
1
⎞⎟t1
⎟⎟ 1
⎟
b⎠
(5.1)
where Xe is the market size forecast for time period t, Xb the market size
forecast for time period 1, and t the number of years in the forecast time
period.
Market share
Market share data indicates how well a company is positioned in a particular market. Those participants with the highest market shares are market
leaders, and those with lowest market shares are market laggards. Low
market share indicates an opportunity for growth. A firm with a large
market share will find it harder to grow further and may seek or create
another market.
Market share can be expressed in terms of units sold or in dollar
amount. The market share of a company during a given time period
measured in dollar amount is
Revenues of the company
Revenues of the company Total revenu es of its competitors
(5.2)
Understanding the Market through Marketing Research
Both the denominator and the numerator are in dollar amounts, and
therefore market share is a dimensionless quantity.
The market share of a company during a given time period measured
in number of product units sold is
Units sold by the company
(5.3)
Units sold by the company Total u nits sold by its competitors
As before, the market share in this case is a dimensionless ratio.
We can also compute the market share of a single product category by
using only the revenue or units sold in that category. For example, for
a consumer electronics manufacturer, its share in the camera category
for a given time period can be computed by either of the two following
expressions:
Camera revenues of the company
ny Total camera revenues
Camera revenues of the compan
of its competitors
(5.4)
Units of cameras the company sold
Units of cameras the compaa ny sold Total units of cameras
sold by its competitors
(5.5)
■ Basis for market segmentation
The ultimate goal of market segmentation is to create homogeneous
segments where constituencies within each segment react uniformly to
marketing stimuli. Market segmentation enables formulation of optimal
marketing targeting strategies for each segment. The bases of segmentation for a particular product are market size, market growth rate, and
market share.
The first step in segmentation analysis is to identify the product of
interest. In case of consumer banking industry, for instance, products
and services can be broken down in segments such as checking, savings,
credit card, line of credit, home equity, home mortgage, insurance, and
brokerage. A bank may examine the market size of each product it offers
or plans to offer and choose to focus on those products or services that
have the largest market size, the highest growth rates, or the lowest market shares. We next consider a hypothetical case study on segmentation.
For Company W, the total market size of product A, B, C, and D in 2007
81
Data Mining and Market Intelligence
was $927m. In this case, market segmentation results in four distinct segments, illustrated in Figure 5-3.
Low
High
Market size
Low
Growth
Segment 2
Segment 1
Product: D
Product: C
Market size : $5 million
Market size : $132 million
Annual growth: 2%
Annual growth: 10%
Priority: Low
Priority: High
Segment 3
High
82
Segment 4
Product: B
Product: A
Market size : $525 million
Market size : $265 million
Annual growth: 5%
Annual growth: 15%
Priority: High
Priority: High
Figure 5-3
Market segmentation by market size and growth.
●
●
●
●
Segment one: small size and low growth
Segment two: small size and high growth
Segment three: large size and low growth
Segment four: medium size and high growth.
Segments three and four represent the most attractive opportunities,
followed by segment two. Segment one represents the least attractive
opportunity.
Market segmentation by market size, market
growth, and market share: case study one
So far, we have discussed market size and market growth; we now revisit
the last hypothetical case study by adding the market-share consideration
(Figure 5-4).
●
●
Segment one: small size, low growth, and medium market share
Segment two: small size, high growth, and low market share
Understanding the Market through Marketing Research
Low
High
Low
Growth
Market size
Product: D
Product: C
Market size : $5 million
Market size: $132 million
Annual growth: 2%
Annual growth: 10%
Market share: 50%
Market share: 30%
Incremental opportunity: $2.5 million
Incremental opportunity: $92.4 million
Priority: Fourth
Priority: First
Segment 3
High
Segment 2
Segment 1
Product: B
Market size: $525 million
Annual growth: 5%
Market share: 90%
Incremental opportunity: $52.5 million
Priority: Third
Segment 4
Product: A
Market size: $265 million
Annual growth: 15%
Market share: 70%
Incremental opportunity: $79.4 million
Priority: Second
Figure 5-4
Market segmentation by market size, growth, and share.
●
●
Segment three: large size, low growth, and medium market share
Segment four: medium size, high growth, and high market share.
Market share information provides additional insight on where true
market opportunities lie. Although the total market size for segment four
is $265m, the incremental market opportunity for Company W is only
$79.5m.
After consideration of market share data in the four segments, we conclude that segment two is the most attractive segment, followed by segments four, three, and one.
We recommend the following three-step process to incorporate market
opportunity information into marketing planning.
●
●
Identification of the market size and its geographic and product breakdown: Table 5-2 is a template for compiling the relevant information.
Product or geography with the largest market size is often the main
revenue source for a company.
Identification of high growth market opportunities: Targeting high
growth opportunities enables revenue generation and long-term
83
84
Data Mining and Market Intelligence
Table 5-2 Template for market size by product and geography
Product A
($)
Product B
($)
Product C
($)
Product D
($)
Total
($)
Region 1
Region 2
Region 3
Region 4
All regions
competitiveness. Table 5-3 is a template for documenting market growth
information.
Table 5-3 Template for annual market growth rate by product and
geography
Product A
(%)
Product B
(%)
Product C
(%)
Product D
(%)
Total
(%)
Region 1
Region 2
Region 3
Region 4
All regions
●
Identification of the market share: Market share information provides
insight on the actual room for growth. Table 5-4 shows a layout for
documenting the market share information.
Table 5-4 Template for market share by product and geography
Product A
(%)
Region 1
Region 2
Region 3
Region 4
All regions
Product B
(%)
Product C
(%)
Product D
(%)
Total
(%)
Understanding the Market through Marketing Research
Using market research and data mining for
building a marketing plan
It is common practice for firms to set up their revenue goals using market
data as the benchmark. A particular company may set a revenue growth
goal of 15% just to outperform the anticipated market growth of 10% and
to gain market share.
It is also a very common practice for companies to apply an arbitrary
percentage (usually a single digit) to their revenue goal to assess their
marketing budgets. This is particularly prevalent in the high-tech industry. For instance, a high-tech company may expect to generate $10b in
revenue and plans to allocate 5% of the expected revenue, or $500m, as its
marketing budget. A more information-driven approach is to apply the
marketing spending modeling techniques discussed in Chapter 2 to analyze historical sales and marketing spending data to produce an optimal
marketing budget and allocation.
Marketing planning based on market
segmentation and overall company goal: case
study two
Based on the previous market segmentation case study illustrated in
Figure 5-4, the most attractive opportunities are segments two, three, and
four. Company X is one of the companies competing in these segments. A
2007 marketing plan for Company X will be created based on the market
segmentation information.
The first step in creating the marketing plan is to populate the template
in Table 5-5 (template one) with the market data of 2006 and 2007. Then,
the incremental market size growth from 2006 to 2007 and the percent
contribution to the overall market size growth is populated for each segment. The total market is expected to grow 8.5%, or $72m, from 2006 to
2007. Out of the $72m, $35m (49% of total growth), $25m (35% of total
growth), and $12m (16% of total growth) are from products A, B, and C
respectively.
The second step is to incorporate the actual revenue and market share
of 2006 into template two, as shown in Table 5-6.
The third step is to incorporate revenue and market share goals into template two in Table 5-6. To determine a realistic revenue growth goal for each
of the three segments, we need to evaluate the historical growth rate of
each product. Table 5-7 shows the 2007 revenue and market share goals for
Company X, based on its historical marketing spending and revenue data.
The company is expected to grow faster than the market in the products B
85
86
Segment
Product
2006
market
size
(million)
2007
market
size
(million)
Growth
(%)
Incremental
market
growth
(million)
% of
incremental
revenue (%)
4
3
2
2,3,4
A
B
C
Total
230
500
120
850
265
525
132
922
15
5
10
8.5
35
25
12
72
49
35
16
100
Company X
incremental
revenue 2007
Company X
return on
investment
2007
Table 5-6 Template two – incorporation of actual company revenue and market share information from 2006
Segment
Product
2006
Company X
revenue
(million)
4
3
2
A
B
C
Total
60
120
80
260
2007 Company
X incremental
revenue goal
(million)
Percent revenue
increase (%)
2007
Company X
revenue
Company X
2006 market
share (%)
60/23026
120/50024
80/12067
260/85031
Company X
2007 market
share goal
Data Mining and Market Intelligence
Table 5-5 Template one – identification of market size of 2006 and 2007, and growth from 2006 to 2007
Table 5-7 Template three – incorporation of company revenue and market share goals by product
Product
2006
Company X
revenue
(million)
2007 Company X
incremental
revenue goal
(million)
Percent
revenue
increase (%)
2007
Company X
revenue goal
(million)
Company X
2006 market
share (%)
Company X
2007 market
share goal
(%)
4
3
2
A
B
C
Total
60
120
80
260
9
12
10.2
31.2
15
10
13
12
69
132
90.2
291.2
60/23026
120/50024
80/12067
260/85031
26.0
25.1
68.3
31.6
Table 5-8 Year 2007 budget allocation based on historical data and modeling
Segment
Product
2006
Company
X revenue
(million)
2007
Company X
incremental
revenue goal
(million)
Percent
revenue
increase
(%)
2007
Company
X revenue
goal
Historical
incremental
revenue over
budget ratio
Company
X 2007
budget
(million)
Company
X 2006
market
share (%)
Company
X 2007
market
share goal
(%)
4
3
2
A
B
C
Total
60
120
80
260
9
12
10.2
31.2
15
10
13
12
69
132
90.2
291.2
1.29
1.33
1.70
1.42
7.0
9.0
6.0
22.0
26
24
67
31
26
25
68
31.6
Understanding the Market through Marketing Research
Segment
87
88
Data Mining and Market Intelligence
and C categories and grow at the same pace as the market in the product
A category. Achieving the revenue and growth goal of ABC will lead to
a market share increase from 31 to 31.6%, a 0.6% increase. As illustrated
in this example, a drastic increase in market share is hard to achieve. An
increase of $31.2 m in revenue from products A, B, and C only results in
0.6% market share gain.
The fourth step is to use the results from the third step to populate
template three with budget information, as illustrated in Table 5-8. The
total budget is $22 m. This is exactly where we can tie together market opportunity information, marketing budget, and overall company
revenue.
Segments four and three have low market shares. There is significant
competition in these two segments, and therefore they require significant investment to gain new customers and market share. This competitive situation is reflected in the lower historical revenue to budget ratios.
Segment two has a high market share and less competition. Market share
gains and losses are highly correlated with competitive pressure, as will
be discussed later in this chapter.
■ Target-audience segmentation
The target audience of a segment is a group of individuals, households,
or businesses that possess similar characteristics and behavior. The following section gives an overview on the common attribute groups used
to describe a target audience.
Target-audience attributes
●
●
Demographic or corporate attributes: These attributes describe the
general characteristics of an individual, a household, or a company.
Age, gender, ethnicity, marital status, education, life stage, personal
income, and home ownership are examples of individual demographic attributes. Household income and household size are examples of household demographic attributes. Company size, company
annual revenue, industry or Standard Industry Code (SIC), and company start year are examples of corporate attributes.
Social–economic attributes: These attributes describe the social economic status of a household or an individual. They are usually constructed based on zip code level census information by data vendors
or marketing research companies (e.g., Personicx, Prizm, Microvision,
Cohorts, and IXI). For example, one of the attributes used by Personics
is referred to as ‘established elite.’ Individuals with this attribute tend
Understanding the Market through Marketing Research
●
●
●
●
to have a higher than average disposable income and a luxurious
life style.
Attitudinal attributes: These attributes describe an individual’s hobbies, interests, and social, economic, or political views, such as interest
in art, space and science, sports, cooking, tennis, travel, politics, economics, antique, or fitness.
Purchase behavior attributes: These attributes describe where an individual, a household, or a business is in a sales cycle. Stages in the sales
cycle such as awareness, interest and relevance, consideration, purchase, and loyalty and referral, are described in detail in Chapter 3.
Need attributes: These attributes describe a customer ’s or a prospect’s
need for acquiring or inquiring about a product. Need for pain relief and
need for wireless Internet connection are examples of need attributes.
Marketing medium preference attributes: These attributes describe an
individual’s preference on how to be contacted, receive information,
or interact with marketers. In-person visits, direct mail, print, TV, telemarketing, billboard, newspaper print ad, and magazine inserts are
examples of offline medium. E-mail, online banner, search, community, podcast, and blog are instances of online medium.
Types of target-audience segmentation
There are multiple ways of segmenting a target audience. Segmentation
of the audience needs to be aligned with business objectives. The four
most common criteria for segmentation are demographics, needs, product purchased, and value, as illustrated in Figure 5-5.
●
●
●
●
Demographics-based segmentation: This is the most common segmentation approach. It gives general descriptions of the various segments
in the target audience. This type of segmentation is very useful for
providing insight and ideas regarding marketing creative, offers, and
messages.
Need-based segmentation: This type of segmentation classifies the
audience by their need and is useful for constructing relevant product
or service offers to potential customers.
Product purchased or installed based segmentation: This type of segmentation classifies the audience by what they have purchased or
deployed at their sites. This information is useful for driving targeted
cross-sell and up-sell marketing strategies and tactics.
Value-based segmentation: This type of segmentation classifies the
audience by their value, often derived by their total dollar amount of
purchase during a period of time. This is a very practical approach as
the eighty-twenty rule shows that 80% of a marketer ’s revenues often
comes from the top 20% of the customers with the highest values.
89
90
Data Mining and Market Intelligence
Demographics
Syndicated
research
Need
Install base or
product purchase
Customized
research
Value
Figure 5-5
Common segmentation types.
Due to its cost effectiveness, syndicated research is a good starting
point for acquiring information on audience segmentation. However,
syndicated research is sometimes limited in that it cannot provide
in-depth value-based segmentation, which is best derived from the company’s internal sales data. Another limitation of syndicated research is
that a research firm may draw samples from and segment a population
that is not fully representative of the desired target audience.
We now consider a segmentation case study in the business-to-business
world. Figure 5-6 shows a small business customer value segmentation
for Company A. The customer segmentation is derived by a data mining
technique called Classification and Regression Tree (CART), which is discussed in detail in Chapter 7.
In the case study, the average purchase amount of small business customers is $5000. To accomplish the segmentation task, we use the CART
technique (we discuss this approach in detail in Chapter 7), which first
splits the sample by industry and identifies the professional service
industry as an industry with high average purchases of $8400 while the
other industries have an average of $2000. Within the professional industry, companies of size (in terms of number of employees) between 50 and
500 have average purchases of $12,000. This is the subsegment with the
highest value in the whole sample. The Tree technique splits the segment
in the other industries into two branches. Similar to the professional services industry, the companies with company size (in number of employees)
Understanding the Market through Marketing Research
Overall SB base
Average purchase
= $5000
Industry:
Other
average purchase
= $2000
Industry:
Professional services
average purchase
= $8400
# Employees < 50
average purchase
= $1000
# Employees
between 50 and 500
average purchase
= $5200
# Employees < 50
average purchase
= $6000
# Branch offices < 2
average purchase
= $3200
# Branch offices
between 2 and 5
average purchase
= $4500
# Branch offices >5
average purchase
= $5900
# Employees
between 50 and 500
average purchase
= $12,000
Figure 5-6
Small business customer average annual purchases of equipment X.
between 50 and 500 have higher average purchases of $5200, versus $1000
of the smaller companies (company size 50). Among the companies
with company size between 50 and 500 and with more than five branch
offices, the average purchases are $5900. This case study illustrates that
with the application of appropriate data mining segmentation technique,
valuable customers subsegments can be uncovered.
■ Understanding route to market
and competitive landscape by market
segment
Once market opportunities and the target audience are identified, the
next step is to assess the ability to compete in each segment through
understanding of route to market and the competitive landscape.
Routes to market
Customers purchase products through different avenues. A route to market is an avenue through which customers purchase products. In the case
91
92
Data Mining and Market Intelligence
of direct sales, customers purchase directly from marketers. In the case of
indirect sales, customers purchase from intermediaries. These intermediaries are called channel partners, retailers, resellers, or distributors.
For example, a company that designs and manufactures women clothing may use several routes to market its product to customers. These routes
include the company’s own physical stores, print catalogues, e-catalogs,
department retail stores, and online web sites.
Direct sales
In direct sales, products are sold directly to customers. In many cases, selling directly to customers is not a scalable business model, and the need
for leveraging a third-party reseller or distributor emerges. For example,
consumer goods manufacturers leverage retail stores such as consumer
electronics stores and supermarkets to distribute and sell their products.
Leading high-tech companies often leverage their channel partners to a
very large degree. In general, direct sales are a more common model in
business-to-consumer than in business-to-business.
Indirect sales
In indirect sales, companies leverage intermediaries to sell their products or services. The main advantage of this distribution method is scalability. These intermediaries are called distributors, resellers, partners,
whole-sellers, retailers, or channel partners. Good channel partners often
enhance a company’s revenue growth. Companies using channel partners
often rely on the partners to contact and interact with end customers. As
a result, these companies often do not have the visibility to end customer
information. However, channel partners can often provide end user data
at an aggregate level. For example, instead of revealing actual end user
names, distributors can provide reports on end user sales by vertical
industry, company size, and geography.
There are different types of indirect sales models depending on the
number of intermediaries involved. A one-tier model refers to a model
where there is only one channel partner between a vendor and an end
user customer. A two-tier model is a model where there are two layers of
intermediaries between a vendor and an end user customer. In this case, a
vendor sells its product to a channel partner that then sells the product to
a reseller. The reseller then sells the products to an end user customer.
Revenue and investment flows
Understanding cost effectiveness by route to market is essential for establishing an optimal channel strategy balance. Figure 5-7 illustrates revenue
flows from direct and indirect sales.
Understanding the Market through Marketing Research
Partner 1
Revenue and profit
from partners
Investment in
partners
Company
Partner 2
Partner 3
Partner 4
Revenue and profit from
end customers
End customers
Investment in
end customers
Figure 5-7
Direct and indirect revenue and investment flows.
It is important for firms to evaluate revenue and profit streams from intermediaries and end customers, as well as the firms’ marketing investments
for selling into both contingencies. If the returns on investment are significantly higher from intermediaries than from end customers, then it is necessary to explore the underlying reasons. It is possible that the market climate
is such that end customers prefer buying from intermediaries. Objectively
assessing returns on investment from both direct and indirect sales enables
firms to embark on an optimal strategy of direct and indirect sales.
We now consider the market segmentation case study in Figure 5-3.
Within each market segment, the contributions of direct sales and channel partner sales are rated as ‘fair ’ or ‘poor,’ as shown in Table 5-9. The
company relies mainly on channel partners or others for selling its products in segments three and four. On the contrary, the company sells most
of its products directly to its customers in segment two. Degree to which
the firm relies on channel partners drives its investments in marketing
spending for direct sales and marketing spending for indirect sales.
Competitive landscape
The way the market perceives the strengths and weaknesses of a particular firm affects the purchasing behavior of its customers. For instance, it is
well known that a manufacturer with a solid brand inspires more trust in
customers, and trust is often a key driver for product selection.
93
94
Data Mining and Market Intelligence
Table 5-9 Market segmentation with route to market information overlay
Market size
Market growth
Low
High
Small
Segment 1
Direct sales: poor
Sales through channels:
poor
Segment 2
Direct sales: fair
Sales through channels:
poor
Large
Segment 3
Direct sales: poor
Sales through channels:
fair
Segment 4
Direct sales: poor
Sales through channels:
fair
Understanding the competitive landscape in each market segment is to
understand a firm’s own strengths and weaknesses as well as those of its
competitors. There are many attributes that can be used to evaluate the
competition. Before analyzing these attributes, however, we must identify
the competitors.
The most common way of identifying key existing and potential competitors is to consult industry trade publications, industry financial
analysts, or research experts. There is often ranking information on companies in each industry, product or service category. Ranking can be based
on market share, growth, or financial position. Sales people and customers can also help identify competitors. Since it is extremely challenging to
examine every potential competitor, evaluation should be limited to the
top existing and potential competitors.
Once the key existing and potential competitors have been identified, the next task is to determine which attributes to use to examine the
strengths and weaknesses of each competitor.
In general, there are four groups of attributes to consider when analyzing the strengths and weaknesses of the competition. The four groups
of competitive attributes are brand recognition, leadership, vision and
innovation, current product offering, operational efficiency, and financial
condition.
●
Brand recognition, leadership, vision, and innovation: The four
seemingly intangible attributes sometimes are important drivers of
customers’ purchase decisions. Brand recognition refers to a set of
perceptions and feelings evoked in customers or prospects when they
are exposed to ideas such as value propositions or images (logos,
Understanding the Market through Marketing Research
●
●
●
symbols) about particular companies. Brand recognition is the result
of customer experience and interaction with a particular company, or
customer exposure to advertising, marketing, and other activities of
the company. Leadership is the ability of an individual to influence,
motivate, and enable others to contribute toward the effectiveness
and success of the organizations of which they are members (House,
Hanges, Javidan, and Dorfman, 2004). Good leadership is consistently
viewed as a competitive advantage for a company. Vision refers to the
long-term objectives of a company. With its vision as a guiding principle, a company may be more likely to evolve in a manner consistent
with its long-term objectives. Innovation is change that creates a new
dimension of performance (Hesselbein, Johnston, and the Drucker
Foundation, 2002) and drives competitiveness.
Current product offering: The current product offering of a firm has
unique features and benefits. How these features and benefits are
perceived in the market segment affects customer purchase behavior.
For example, a product that is more reliable than its competing products will attract buyers that value reliability. In addition to reliability,
attributes such as customer service, quality, relevance, convenience,
ease of deployment and installation, scalability, warranties, variety,
and pricing are also important. Service, in particular, has become a
crucial factor for customers when evaluating products.
Operational efficiency: Operational efficiency in corporate functions
such as manufacturing, management, sales, marketing, fulfillment,
inventory, and customer service are also important in shaping market perception of a firm. Companies with frequent delays in product
delivery or companies that deliver defective products are likely to be
perceived as companies with operational ‘weakness.’
Financial condition: Financial condition is the overall company’s
financial performance, as reflected by indicators such as stock growth,
revenue, profitability, returns on equity, returns on assets, debt and
cash positions that affect the ability of the firm to acquire financing
when necessary, capitalization, and the P/E ratio. A strong financial
condition is considered a competitive advantage.
Competitive analysis methods
There are four analytical formats for analyzing competitive landscape:
tabulation, grid, strength, weakness, opportunity, and threat (SWOT)
analysis, and perceptual maps.
●
Tabulation: The tabulation format is the easiest approach for compiling competitor information. In the following example, the six supermarket chains in the San Francisco Bay Area are evaluated for their
95
96
Data Mining and Market Intelligence
●
●
competitiveness by brand recognition. The ratings range from one (the
weakest rating) to five (the strongest rating) in each attribute category.
These six supermarket chains in the analysis are Safeway, Albertsons,
Bell, Costco, WholeFoods, and Ranch 99. Safeway and Albertsons are
the two mainstream supermarket chains in the Bay Area and the betterknown brands of the six. Safeway and Albertsons have considerably
more stores than the other four competitors. Costco is well known
for its warehouse environment and low prices. Bell is slightly lesser
known than Safeway, Albertsons, and Costco while WholeFoods
caters to a more upscale market. Ranch 99 mainly caters to the Asian
community. Based on the above information, we may give Safeway
and Albertsons a rating of five, Costso a rating of four, and Bell and
Ranch 99 a rating of three for brand awareness among mainstream
grocery shoppers. The advantage of the tabulation approach is that it
compares each player ’s strengths and weaknesses with each other ’s
in detail. The disadvantage is that the tabulation approach does not
provide a holistic summary of the overall competitiveness. Therefore,
when the number of attributes is large, it’s difficult to derive a clear
overall picture with the tabulation method.
Grid is a common format used by research companies to analyze the
competitive landscape. These companies use their proprietary methodologies to examine company competitiveness and present the result
in a grid. Very often a competitive grid has two or three key indicators.
Each key indicator is usually a composite index based on the values
of specific key attributes. Some of these key attributes are similar to
what we have discussed in the tabulation example. Gartner Research
has developed Magic Quadrant, a graphical presentation of the competitive landscape for each of its key technology groups. Forrester
Research has developed a competitive grid called Forrester Wave.
Unlike the tabulation approach, grids don’t show detailed information
about each player ’s strengths and weaknesses at the attribute level.
Instead, grids provide a holistic, synthesized, and graphic summary
view of the competitive landscape.
SWOT, which stands for strengths, weaknesses, opportunities, and
threats, is a very popular format for competitive landscape analysis.
A SWOT analysis summarizes a firm’s overall competitive strengths
and weaknesses, the market opportunity, and the competitor threats.
We now reconsider our previous example of market segmentation of
Company X in Figure 5-5 with a focus on the competitive landscape
in segment four. The market size of this segment is estimated to be
$230m in 2006 and $265m in 2007. The growth from 2006 to 2007 is
15%. The following is a four-step process for constructing a SWOT
analysis. The first step is to identify Company X’s strengths, weaknesses, market opportunities, and competitive threats. The result is
Understanding the Market through Marketing Research
Table 5-10 Identification of the strengths, weaknesses, opportunities,
and threats in segment four
Strengths
Weaknesses
● Extensive channel partner
● Company X’s market share is a
network for distributing and
selling
● Reliable product
● Competitive price
distant no.2 from that of the market
leader, TUV (market share: 50%)
● Company X has poor brand name
recognition compared to TUV
Opportunity
Threats
● High overall market growth
● Market leader TUV is actively
at 15%
pursuing Company X’s largest
customers
● A local vendor in Asia just
announced a major price-cutting
promotion
illustrated in Table 5-10. The second step is to leverage the strengths to
take advantage of the current opportunities or mitigate the competitive threats. This is shown in Table 5-11. The third step, illustrated in
Table 5-12, is to prevent weaknesses from sabotaging opportunities or
amplifying competitive threats. The fourth step is to maintain areas of
strengths and strengthen areas of weakness over time. The outcome
Table 5-11 Leveraging strength to take advantage of opportunities or
mitigate threats
Leveraging strength to take
advantage of opportunities
Leveraging strength to mitigate
threats
● Leveraging Company X’s extensive
● Creating a customer loyalty
channel partner network to
capture high market growth
(e.g., increasing investment in
joint customer seminars with
partners)
● Promoting value propositions
on product reliability and
competitive pricing
program to prevent customer
attrition due to TUV’s threat
● Examining and negotiating
the profit-margin structure
with partners in Asia to
ensure maximum level of price
competitiveness
97
98
Data Mining and Market Intelligence
Table 5-12 Preventing weaknesses from sabotaging opportunities or
amplifying threats
Preventing weaknesses from
sabotaging opportunities
Preventing weaknesses from
amplifying threats
● Poor brand recognition may
● Poor brand recognition may
prevent Company X from taking
advantage of this opportunity.
Company A needs to invest in its
brand awareness programs
prevent company X from
convincing the market that
it provides value with a price
premium in Asia. Company X
needs to promote its value
proposition and brand in Asia
of a SWOT analysis will not only guide short-term planning, but also
point out areas for improvement for long-term success. Like the other
competitive analysis formats, SWOT has its advantages and disadvantages, as shown in Table 5-13.
Table 5-13 Advantages and disadvantages of SWOT
Advantages of the SWOT method
Disadvantages of the SWOT method
● Information is easy to acquire
● Analysis may be subjective
● Hard to quantify the
from syndicated research
companies or internal
marketing/sales groups
● Analysis is easy to construct
● Analysis is easy to digest
●
interrelationships between
the four components, namely,
strength, weakness, opportunity,
and threat
● Hard to tie the information to
customers, their needs, and their
future purchase plans
● Public information may lead to
minimal competitive advantages
A perceptual map is a graphical depiction of the market perception of
a product. The key difference between a grid and a perceptual map
is that the latter is constructed with statistical data mining techniques
while the former is usually derived from marketing research data
summaries. We now consider the illustrative case study of Company
X in Figure 5-5. A survey is conducted on three groups of audiences
for product A: prospects, high-value customers (customers who have
Understanding the Market through Marketing Research
purchased high volume of product A), and low-value customers (customers who have purchased low volume of products A). The three
groups of audiences are asked to identify whether six specific features
are crucial to their purchase decisions. These six features are brand
strength, uniqueness of features, pricing, product quality, customer
service, and ease of acquiring the product. The three groups of audiences are also asked to identify if they associate any of these six features with the three main vendors, Company X, Competitor Y, and
Competitor Z. The results of the survey are compiled and correspondence analysis, which we will discuss in Chapter 7, is conducted to
construct the perceptual map shown in Figure 5-8. From the map, we
can observe the distances between the three target audiences, the six
product features, and the three companies. Close proximity indicates
a higher degree of association. On the map, Company X is close to
Low-value customers
Dimension 2
Competitor Y
Uniqueness
of features
Pricing
Prospects
Brand strength
Competitor Z
Easy to acquire
Company X
Customer service
Product quality
High-value customers
Target audience
Company or product attribute
Company name
Dimension 1
Figure 5.8
Perceptual map analysis of product A.
three features: brand strength, product quality, and customer services.
Therefore, these three features are Company X’s strengths and competitive advantages. In addition, Company X is the vendor that is closest to the high-value customer segment. This means that Company X
is viewed by this audience segment more favorably than its two competitors. Like the other competitive analysis formats, the perceptual
map has its advantages and disadvantages, as shown in Table 5-14.
99
100
Data Mining and Market Intelligence
Table 5-14 Advantages and disadvantages of a perceptual map
Advantages of perceptual map
Disadvantages of perceptual map
● Provides graphic representation
● Requires significant investment
of multiple dimensions
simultaneously
● Relationships between
competitive advantages, target
audience, customer needs, and
customer future purchase plans
are easily quantifiable
● Audience feedback provides
objective view of the competitive
landscape
● Proprietary analysis may lead to
competitive advantage
in time, resources, and
expertise in gathering and
analyzing relevant data
■ Overview of marketing research
Marketing research is research that helps advance understanding of the
market and the customers, generating information that helps make better marketing investment decisions. The following skills are the required
skills for conducting marketing research.
●
●
●
●
●
●
●
●
●
●
Economic, business, and statistical knowledge
Experience in syndicated research and customized research
Experience in primary and secondary data
Knowledge of survey sampling, sample size, and questionnaire design
Knowledge of focus group research
Knowledge of panel studies
Knowledge of request for proposal (RFP) and research vendor
management
Knowledge of list rental and list brokerage business
Ability to communicate and explain complex research concepts to
both business and IT audiences
Ability to provide actionable recommendations to address business
issues.
Figure 5-9 shows a step-by-step thought process for marketing research
planning and implementation.
Understanding the Market through Marketing Research
Identify business
objectives
Determine final
research
deliverable
requirements
Search current
available
Y
syndicated
research and
determine if it
addresses the need
Utilize syndicated
research
N
Acquire
customized
research: solicit
research vendor
proposals with a
RFP
Select appropriate
vendor proposal
Actively participate
in the research
process including
sample selection
and questionnaire
design
Translate research
results to
actionable
business
recommendations
Figure 5-9
Effective marketing research thought process.
Throughout the remaining of this chapter, we will introduce the following important market research topics: syndicated research versus
customized research, primary data versus secondary data, sample size,
questionnaire design, focus groups, and panel studies.
Syndicated research versus customized
research
Syndicated research, which can be acquired through subscription, is
research that is prepackaged by research companies. This type of research
is conducted on the basis of the research firms’ assumptions, specifications, and criteria. When searching for market data and intelligence, we
should first consider syndicated research since it is one of the most costeffective sources. Different research firms specialize in different industries and products. Subscriptions are usually on a one-time, quarterly,
or annual basis. In addition to selling prepackaged syndicated research,
research companies often provide consulting services arranged along the
following lines.
●
Subscription to an inquiry service grants subscribers direct access
to analysts for additional information beyond the standard reports.
There is usually a threshold on how much time a subscriber can spend
101
102
Data Mining and Market Intelligence
●
with analysts either face to face or by phone over the duration of his
subscription.
Occasionally, analysts may recommend a one-time project to address
a subscriber ’s additional needs. A one-time project may lead to a customized research project, as we discuss in the next section.
Customized research tends to be more expensive than syndicated
research as the former is customized for very specific needs and the latter
is intended for a broader audience base. Customized research has very
specific objectives and deliverables customized to a particular marketer ’s
needs and often involves collecting primary survey data. Table 5-15 illustrates the main differences between syndicated research and customized
research.
Table 5-15 Syndicated research versus customized research
Research specification
Data collectors
Cost
Syndicated
research
Third-party research
company
Third party
Low
Customized
research
Marketers themselves
Third party or
marketers themselves
High
The following step-by-step process should be followed for planning
and executing customized research:
●
●
●
●
●
●
●
●
Identification of business objectives
Identification of deliverables that allow the research project to meet
the objectives
Creation of an RFP to solicit research vendors’ proposals
Evaluation and selection of vendor proposals
Determination of sample size and source
Designing of the questionnaire(s)
Collection of data
Analysis of results to derive learning.
Customized research planning case study
Company ABC is a storage system supplier for Fortune 500 companies in
the US. The overall objective of ABC is to understand the future storage
spending of its customers, their vendor preferences, their purchase
Understanding the Market through Marketing Research
processes, and the appropriate marketing messages. Specifically,
Company ABC wants to address the following questions through customized research.
●
●
●
●
●
How much do the Fortune 500 companies plan to spend on storage in
2008?
Do different industries have different levels of need in storage systems
in 2008?
Which vendors are on the top of mind among the Fortune 500 companies when it comes to storage system purchases?
What are the Fortune 500 companies’ selection criteria for storage
vendors?
What marketing messages will resonate well with these Fortune 500
companies?
The following five deliverables can be used to address the questions
above:
●
●
●
●
●
Understanding the 2008 budgets for storage systems among the
Fortune 500 companies
Analyzing the 2008 storage budgets by industry
Compiling vendor rankings from survey respondents at Fortune 500
companies
Getting feedback regarding vendor selection criteria
Getting input regarding drivers and barriers for purchasing storage
systems.
After Company ABC identifies its research objectives and deliverables, it creates an RFP to solicit vendor proposals. An RFP is an effective way of collecting vendor proposals for evaluation. An RFP does not
need to be overly complex. However, it needs to cover the following key
components:
●
●
●
●
●
●
●
Project overview
Objectives
Deliverables
Methodology
Proposal submission
Project timeline
General conditions and terms.
The next example illustrates the use of RFP to solicit vendor proposals.
Project overview ABC, a company specializing in storage systems,
wants to understand the future needs of this market to prioritize marketing investments and resources.
103
104
Data Mining and Market Intelligence
Objectives The project has the following objectives:
●
●
●
●
●
Understanding the 2008 budgets for storage systems of Fortune 500
companies
Analyzing the 2008 storage budgets by industry
Compiling vendor rankings from survey respondents at Fortune 500
companies
Getting feedback regarding vendor selection criteria
Getting input regarding drivers and barriers for purchasing storage
systems
Deliverables
●
●
●
●
Executive summary
Analysis and recommendations to support the business objectives
Raw survey data
An on-site presentation explaining the results and conclusions of the
research
Methodology Blind face-to-face interviews
Proposal submission Submit proposal to ABC by September 27, 2007.
Contact information:
Sheila Wu, Research Manager
Tel: (703) 446-5272, e-mail: swu@abc.com
Project completion timeline Completion by November 30, 2007.
General conditions and terms All information provided herein is the
proprietary of ABC, Inc. This information is furnished specifically and
solely to allow the prospective vendor to estimate the cost of executing
this project. Any other usage of this information is strictly prohibited
without the prior written consent of ABC.
After distributing the RFP, Company ABC waits for the research vendors to submit their responses. ABC should look for the following key
components in a vendor proposal:
●
●
●
●
●
●
●
Project overview
Objectives
Methodology
Data and analysis
Survey sample and questionnaire (if collection of primary survey data
is required)
Deliverables
Project timeline
Understanding the Market through Marketing Research
●
●
●
●
Vendor project team (team member qualification and biography)
Overview of vendor capability (competitive strengths relative to the
other vendors)
Fees
General conditions and terms (including legal and contractual
agreements).
A vendor proposal is, in essence, a research plan. A proposal should
correspond to the key components in the RFP with more in-depth information. Table 5-16 illustrates a comparison between an RFP and a vendor
proposal in terms of key components.
Table 5-16 Comparison of key components between an RFP and a ven-
dor proposal
Key component required
RFP
Vendor proposal
Project overview
Objectives
Deliverables
Methodology
Data and analysis
Survey sample and questionnaire*
Proposal submission
Project timeline
Overview of vendor capability
Vendor project team and member bio
Professional fee
General conditions and terms
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
* If primary survey research is required.
Primary data versus secondary data
Primary data refers to research data directly collected from the target
audience. Data collected from the target audiences by others (third parties) is called secondary data. It is often more expensive to acquire primary data than secondary data. However, there are situations where
collection of primary data is necessary. For example,
●
●
The business objectives cannot be met by any existing syndicated
research.
The target audience is so specific that no syndicated research can
address the specific need.
105
106
Data Mining and Market Intelligence
●
Existing syndicated research reports may offer conflicting information that cannot be reconciled. It is very common for different research
companies to produce differing forecasts for the same market.
Surveys
In a survey, a questionnaire or a script is used to collect information from
a group of people through various communication methods such as direct
mail, e-mail, telephone, and face to face. A questionnaire consists of a list
of questions in either multiple choice or open-ended text format.
Survey communication methods
A variety of survey methods are available. Different audiences may have
difference preferences about how they are surveyed. It is important to
ensure that the responder composition is representative of the overall target
audience.
In a direct mail survey, questionnaires are sent to the target audience by
postal mail. Respondents fill out the questionnaire and return the questionnaires in a business response envelope (BRE) or a business response
card (BRC) by mail.
Cost of direct mail can sometimes be high and is driven mainly by
direct mail production and mail postage. Cost per response is higher if
the target audience reached is irrelevant or unresponsive.
Direct mail response rate is driven mainly by the relevance of the target
audience and address data accuracy. Lower address data quality tends to
result in lower response rate. Response rate may vary by industry, product, and service.
Response time of direct mail varies and ranges from a couple of days to
weeks. Most responses come in within a month. Response time depends
on the complexity of the questionnaire as well. The more complex a questionnaire is, the longer its response time will be.
E-mail
In an e-mail survey, electronic questionnaires are sent to the target audience by e-mail. Respondents may respond by completing the questionnaire on the web, by e-mail, or by other methods.
E-mail cost tends to be low and usually runs at several cents per e-mail
sent. E-mail is one of the least expensive ways of conducting a survey.
E-mail response time is usually much shorter than direct mail response
time. Most responses come in within days.
As is the case with direct mail, e-mail survey response rate depends on
factors such as target audience and accuracy of e-mail address data.
Understanding the Market through Marketing Research
Phone surveys
In a phone survey, the target audience is contacted by phone to answer a
list of questions from a questionnaire or a script read by a phone survey
representative.
Phone interview cost is higher than direct mail and much higher
than e-mail. However, phone interviews have the advantage of getting
respondent data instantly and clarifying any questions or confusion that
respondents may have. When interviewer training is required, additional
training cost needs to be factored in.
Phone interview response time is real time. Once a respondent is
reached and agrees to go through a survey, the data is collected instantly.
The challenge is to successfully reach the target audience.
Phone interview response rate also depends on factors such as responsiveness of the target audience, accuracy of phone numbers, offer, and
target audience availability.
Training is essential for phone interviewers. Interviewers need to possess the basic understanding of the use of a script or questionnaire to
collect answers from respondents. When the research subject is technical or specialized, additional training needs to be given to interviewers
so they can articulate their questions appropriately. In some cases, interviewers need to have certain professional knowledge and experience to
effectively carry out the survey. For example, a phone survey on computer server purchases may require interviewers with in-depth technical
knowledge in the sever business. Interviewers also need to be very perceptive of the respondent’s reactions and must be able to make adjustments accordingly.
Nowadays, phone interviews are often conducted in call centers where
every interviewer has a cubicle, a phone, and a computer terminal to
access information when needed. Usually a supervisor is assigned several interviewers to monitor. Supervisors are equipped with communication gear to give timely coaching or feedback to their staff interviewers.
Interviewers’ access to timely feedback is one advantage that phone interviews may have over other types of interview methods such as direct
mail, e-mail, or face-to-face interviews.
Computer assisted telephone interviewing (CATI) is designed to
enable phone interviewers to conduct telephone interviews effectively.
CATI enables predictive dialing, questionnaire management, sample
and quota management, data access, data entry, and analysis. Predictive
dialing is a feature of CATI that allows for automatic dialing of batches
of phone numbers to connect with phone interviewers and those they
intend to survey. Sample and quota management is a feature of CATI
that tracks and compares a predetermined quota on respondents and the
number of respondents that a phone interviewer actually reaches and
surveys.
107
108
Data Mining and Market Intelligence
Prescheduled or intercept face-to-face interview
Face-to-face interviews can be prearranged with potential interviewees
and conducted in a predetermined location at a preset date and time.
Prearranged interviews allow for careful screening of the target audience
and suitable arrangement of the interview start and end time. Planning
prior to interviews can be very time-consuming and resource intensive.
On the other hand, face-to-face interviews that are not prearranged are
intercept interviews, which do not require a great deal of time for audience screening. The preparation and planning time is minimal, and one
can usually find a large sample of potential interviewees in places such
as shopping malls. However, the quality of interviewees from intercepts
might sometimes be questionable due to lack of prescreening.
As is the case with phone interviews, once a respondent agrees to be
interviewed, the response time of a face-to-face interview is real time.
Response rate depends on numerous factors such as relevance of target
audience and timing of the interview.
Training is essential for face-to-face interviewers. The interviewers need
to have a basic understanding of how to follow a script or questionnaire
to collect answers from the respondents.
In-person interviewers have the opportunities to observe the respondents face to face and adjust questions accordingly.
Panel studies
A panel is a group of people, households, or businesses that respond to
questionnaires on a periodical basis. The duration of a panel can vary
from days to years. Panel surveys are administered through direct mail,
e-mail, and face-to-face interviews. Major research companies usually
have established panels for ongoing surveys and monitoring.
The cost of panel studies depends on the survey mechanism. Usually,
phone and face-to-face interviews are more expensive than direct mail
and e-mail.
The response time also depends on the survey mechanism. The
response rates of panel studies are usually higher since panels usually
consist of dedicated respondents.
The following is a list of examples on existing panels (Blankenship and
Breen, 1995).
●
●
Nielsen Media Research offers national measurement of television
viewing National Television Index (NTI) using its People Meter to
measure the television viewing of various household members.
Arbitron’s Portable People Meter (PPM) measures consumers’ exposure to any encoded broadcast signal (e.g., cable TV and radio).
Understanding the Market through Marketing Research
●
●
●
NPD Group has an online panel of over 3m registered consumers
(www.npd.com, 2007).
Home testing Institute, a division of Ipsos, has a panel of households
available for monthly mailing surveys.
ACNielsen SCANTRACK collects scanner-based marketing and sales
data weekly from a sample of stores.
Panel surveys have several advantages over other alternatives. First,
they provide the possibility of conducting longitudinal studies to observe
behavioral changes in the same sample over time. Second, panel surveys
usually cost less than nonpanel surveys since there is only a one-time
setup cost with panels.
Omnibus studies
An omnibus study is an ongoing study in which new questions can
be added gradually to a regular panel study. Omnibus studies are costeffective since multiple companies share the up-front survey setup
cost. Omnibus studies are cost-effective when there are few questions to
be added to the survey. Omnibus studies become less cost-effective when
the number of additional questions is large (Blankenship and Breen,
1995).
Focus groups
A focus group is a small discussion group led by an experienced moderator, whose role is to stimulate group interactions. This format has the
advantage of generating group insight that is not attainable through separate one-on-one surveys. Focus groups can be used for exploring new
product ideas, advertising concepts, and customer attitudes and perception. It is a qualitative rather than a quantitative method given that the
sample size is very small (usually between 7 and 12 people in a group).
However, insight gathered from a focus group can be very helpful for
planning further research and analysis via other mechanisms.
The cost of a focus group can be significant. Such cost includes the
expense in recruiting the group members, the moderator fee, facility
access, and equipment for monitoring and recording.
Sampling methods
There are two types of sampling methods: probability and nonprobability
sampling (Green, Tull, and Albaum, 1988). Probability sampling involves
applying some sort of random selection. Nonprobability sampling does
109
110
Data Mining and Market Intelligence
not involve application of random selection. There are four types of probability sampling methods.
●
●
●
●
Simple random sampling: In simple random sampling, each subject
has an equal probability of being selected. The first step in this sampling method is to assign each subject a computer-generated random
number. For example, to select a sample of 1000 out of a population
of 100,000, one would generate a uniform random number for each of
the 100,000 records and select the 1000 records with the highest random numbers.
Stratified sampling: In stratified sampling, the data is first divided
into several mutually exclusive segments, and then a random sample
is drawn from each segment.
Cluster random sampling: In cluster random sampling, the data is
first divided into mutually exclusive clusters (segments), and then one
cluster is randomly selected. All of the records in the selected cluster
will be measured and included in the final sample.
Multistage sampling: In multistage sampling, more than one of the
sampling methods mentioned previously are utilized.
There are four types of nonprobability sampling methods.
●
●
●
●
Quota sampling: In quota sampling, a sample is selected based on a
predefined quota. For example, given a quota of 50:50 female-to-male
ratio and a total of 1000 subjects, 500 females and 500 males will be
selected.
Convenience sampling: In convenience sampling, samples are drawn
from data sources that are easy or ‘convenient’ to acquire. For example, in clinical trials or shopping mall intercept surveys, respondents
are acquired based on their availability. Availability does not guarantee that the sample is representative of the population, however.
Judgment sampling: In judgment sampling, the sampler has a predefined set of characteristics on which the sampling is based. For example, in a mall intercept survey, the samplers may target adults with an
age range between 20 and 30.
Snowball sampling: In snowball sampling, the sampler relies on
‘viral marketing’ or ‘word of mouth’ to increase his sample size. In
this case, the original sample size may be small but the sample size
increases as those sampled refer people they know to the sampling
process.
Sample size
One frequently asked question in survey research is how big the sample size
should be. In general, we assume that the data has a normal distribution
Understanding the Market through Marketing Research
and select a sample size that will give the desired result within a
given confidence level such as 95%. A confidence level is defined as the
percentage of time when the result is expected to be accurate not due to
chance.
Sample size based on sample mean
In this section, we discuss the derivation of an appropriate sample size
that allows for proper estimation of the mean of an attribute in a sample. The first step in the derivation is to determine an acceptable standard error of the attribute mean estimation, denoted as E. For instance,
we may assume the acceptable standard error as 0.5 years when estimating the mean age of the persons in a sample. The second step is to assess
the standard deviation of the age in the population, denoted as . Let us
assume that the standard deviation of the age is 3 years. The third step in
the derivation is to identify the Z score (the concept of Z score will be discussed in Chapter 6) for a predetermined confidence level, such as 95%.
In a normal distribution, the Z score at a 95% confidence level is 1.96. The
fourth step of the derivation is to compute the appropriate sample size.
The sample size is obtained based on the following formula:
n=
3 2 (1.96)2
2 Z2
=
= 138
0. 52
E2
(5.6)
In this example, the appropriate sample size is 138.
Sample size based on sample proportion
This section discusses the process of deriving the appropriate sample
size for estimating the percentage of voters with a particular voting disposition. The first step in the derivation is to estimate the general voting disposition of the population p such as 45% voting for party X. The
second step in the derivation is to determine a standard error E such as
0.5%. The third step in the derivation is to identify the Z score given a
confidence interval such as 95%. In a normal distribution, the Z score at a
95% confidence level is 1.96. The fourth step is to compute the appropriate sample size. The sample size can be computed based on the following
formula:
EZ
p(1 p)
n
The appropriate sample size n is 38,032.
(5.7)
111
112
Data Mining and Market Intelligence
■ Research report and results
presentation
It is important to deliver a final research report or presentation that
clearly addresses the initial business objectives. Data and information are
important, but actionable recommendations are even more crucial. One
common mistake frequently seen in research reporting is the presentation of an abundance of data and charts with no actionable recommendations. The following is a framework for the basic structure of an effective
research report or presentation.
Structure of a research report
●
●
●
●
●
●
●
Background: This section gives an overview of the project background. This overview should be consistent with the original proposal
and the RFP.
Outline: The outline section consists of the topics the report discusses.
The objective of the outline is to give the reader a clear idea on what
to expect throughout the report.
Executive summary: This is one of the most important sections in the
whole report. Busy executives often scan only the summary section to
determine if the report is worth further reading. It must be factual and
must provide the answers needed to address project objectives. The
executive summary must also highlight a set of practical and actionable recommendations.
Research methodology: This section needs to clearly state the research
methodology employed. For example, if a survey is included in the
study, survey mechanism such as direct mail and e-mail needs to be
clearly stated. If questionnaires are involved, they should be included
in an appendix.
Data sources: In this section, the source of the data and the sample size, if applicable, need to be specified. Clear description of data
attributes needs to be given, and data collection methods need to be
stated as well. Detailed information about the data can be included in
an appendix.
Key findings: Key findings are compilation of results in more granular
detail than what is presented in the executive summary. It is a good
idea to break down the findings into different sections and have a
summary for each section. This helps the reader not to get mired in
data and numbers.
Recommendations: Recommendations must be actionable, practical,
and need to be fact-based and analysis-driven.
Understanding the Market through Marketing Research
●
●
Reference and acknowledgments: Acknowledgments need to be given
when quoting a data source or a piece of analysis that one does not
have ownership of. Written permissions from owners of the data
sources may be required.
Appendix: In this section, we can insert additional information such
as questionnaires, anecdotal commentary, and detailed information on
raw data.
■ References
Blankenship, A.B., and G.E. Breen. State of the Art Marketing Research. AMA/NTC
Business Books, Chicago, Illinois, 1995.
comScore Networks comScore Media Matrix. comScore Press Release. Reston,
Virginia, February 28, 2006.
Hesselbein, F. and R. Johnston. The Drucker Foundation. On Leading Change: A
Leader to Leader Guide. Jossey-Bass, San Francisco, CA, 2002.
House, R.J., P.J. Hanges, M. Javidan, and P.W. Dorfman. Culture, Leadership, and
Organization: The Global Study of 62 Societies. Sage Publications, Thousand oaks,
CA Inc, 2004.
Green, P., D.S. Tull, and G. Albaum. Research for Marketing Decisions, 5th ed.
Prentice-Hall, Englewood Cliffs, New Jersey, 1988.
113
This page intentionally left blank
CHAPTER 6
Data and
Statistics
Overview
This page intentionally left blank
This chapter gives an overview of data and basic statistics with an
emphasis on the data types and distributions that drive the selection of
data mining techniques particularly relevant to quantitative marketing
research.
■ Data types
The data we are concerned with results from assigning values (historical
or hypothetical) to variables used in statistical analysis. Therefore, when
we refer to data types, we also refer to the types of the variables the data
originates from. There are two data types: non-metric data and metric
data. Within each data type, there are sub data types.
Under the nonmetric data type, there are three sub types: binary, nominal
(categorical), and ordinal. A binary variable has only two possible values.
Whether or not a survey has received a response can be characterized
by a binary variable with only two possible values: ‘response’ and ‘no
response’.
A nominal or categorical variable can have more than two values. For
example, a variable ‘income group’ can have three possible values: ‘high
income’, ‘medium income’, and ‘low income’. The values of a nominal or
categorical variable are given for identification purposes rather than for
quantification. Group number, a variable used to identify specific groups,
is an example of nominal variable.
An ordinal data type differs from binary or nominal types in that it
denotes an order or ranking. Ordinal data does not quantify the difference between any two rankings, however. Assume we rank three movies
on the basis of their popularity. ‘Spiderman II’, ‘Sweet Home Alabama’,
and ‘Anger Management’ have the ranking of first, second, and third,
respectively. ‘Spiderman II’ is more popular than ‘Sweet Home Alabama’
but the ranking does not provide any information on the difference in
popularity between the two movies.
Metric data is also called numeric data and can be either discrete or continuous. A discrete variable, such as the age of persons in a population,
takes on finite values. A continuous variable is a variable for which, within
the variable range limits, any value is possible. Time to complete a task is
a variable with a continuous data type.
■ Overview of statistical concepts
This section provides an overview of fundamental statistical concepts
including a number of basic data distributions.
118
Data Mining and Market Intelligence
Population, sample, and the central limit
theorem
A population is a set of measurements representing all measurements
of interest to a sample collector (Mendenhall and Beaver 1991). The population of females aged between 35 and 40 in New York City consists of
all females in that age range in New York City.
A sample is a subset of measurements selected from a population of
interest. For example, a sample may consist of 500 women aged between
35 and 40, randomly selected from the various boroughs in New York
City. In this example, the sample size is 500.
According to the central limit theorem, the distribution of the sum of
independent and identically distributed random variables tends to a normal distribution as the number of such variables increases indefinitely
(Gujarati 1988). The concept of normal distribution will be discussed in
the data distribution section in this chapter.
Random variables
In what follows we are interested in data that can be modeled as random
variables. A random variable is a mathematical entity whose value is not
known until an experiment is carried out. In this context, carrying out an
experiment means observing the occurrence of an event and associating
a numerical value to the event. For example, consulting a news report to
determine whether the stock market went up from yesterday to today is
an experiment and the fact that the market went up or down is the event
in question. The amount by which the stock market went up or down is
the value associated with the event just described. This value is the realization of a random variable. In this case, the random variable is the stock
market change between yesterday and today.
Random variables are called discrete if they take on discrete values,
or continuous if they take on continuous values. The Growth Domestic
Product (GDP) is an example of a continuous random variable, whose value
becomes known when the GDP is reported. The number of customers per
hour that visit a store is an example of a discrete random variable.
Next we review basic concepts of probability that we need in order to
model data as random variables.
Probability, probability mass, probability density,
probability distribution, and expectation
Probability is the likelihood of the occurrence of an event. Since random
variables are numerical values associated with events, in what follows we
Data and Statistics Overview
will simply refer to probability as the likelihood that a random variable
realizes (or takes on) a particular value.
If the random variable is discrete, the probability that it will take on a
particular value is given by its so-called probability mass. The probability
mass is a positive number less or equal to one. The probability mass that
a discrete random variable X takes the value xj is denoted by p(xj).
To describe continuous random variables we need the concept of probability density. If X is a continuous random variable, the probability that X
takes on a value within the interval x and x dx is given by p(x) dx, where
dx is a differential and f(x) is the probability density function. Notice the
distinction between X, the random variable, and x, the values X can take.
The probability distribution function (not to be confused with the density function) is the probability that a random variable will take on values
less or equal to a particular value. If the random variable is discrete and can
take on n values, its probability distribution function is defined as follows.
ji
P(xi ) ∑ p(x j )
j1
(6.1)
If the random variable is continuous, the probability distribution function is defined as follows:
x
∫
F( x ) f (s)ds
(6.2)
xmin
where xmin is the minimum value random variable X can take.
In the case of a discrete variable X that can take on n values, the expectation is defined as follows.
in
E(X ) ∑ xi p(xi )
i 1, 2, 3 , … , n
(6.3)
i1
If random variable X is continuous, its expectation is given by the
formula
xmax
E(X ) ∫
xf (x )dx
(6.4)
xmin
where xmin and xmax are the smallest and largest values the continuous
random variable X can take. The expectation is usually denoted by the
Greek letter .
119
120
Data Mining and Market Intelligence
Mean, median, mode, and range
There are two types of means: arithmetic and geometric. The properties
we discuss next are defined for both continuous and discrete random variables. For simplicity, however, in this section we focus on the discrete case.
In the case of discrete random variable, X, the arithmetic mean is given
by the average of its possible values.
in
Xa
∑ i1 xi
i 1, 2, 3 , … , n
n
(6.5)
If the values 2, 6, 8, 9, 9, and 11 are instances of the random variable X,
its mean is
Xa 2 6 8 9 9 11
7.5
6
The geometric mean of a discrete random variable is given by the geometric average of its possible values.
in
X g n x1 x2 … xn n ∏ xi
i 1, 2, 3 , … , n
i1
(6.6)
For the same realized values of X, the geometric mean is
X g 6 2 6 8 9 9 11 6.64
The median is the value in the middle position in a sorted array of values.
If there are two values in the middle position, the median is the average of
these two values. The median in the example we are discussing is 8.5.
The mode is the number that appears most frequently in a group of values. In our example, the mode is 9.
The range is the difference between the largest and the smallest values in
a group of values. In our example, the range is 9, the difference between 2
and 11.
Variance and standard deviation
The variance of a population of N observations, 2, is the mean of the
squares of the deviations of the observations from the population mean .
2 1
N
iN
∑ ( x i )2
i1
(6.7)
Data and Statistics Overview
The standard deviation of a population is the positive square root of
the population variance.
1
N
iN
∑ i1
( x i )2
(6.8)
The standard error of the mean of a sample of size N is given by the
expression
N
(6.9)
The variance of a sample of size n N is defined in the same way as the
variance of the population, with N replaced by n. However, especially when
the sample size is small, it is preferable to use an alternative expression for
the variance of the sample where n is replaced by n 1, as follows,
s2 1 in
∑ ( x i x )2
n 1 i1
(6.10)
The reason why this formula is preferred to the expression for the
variance is that s2 is an unbiased estimator of the population variance.
The corresponding expression for the standard deviation of a sample is
(Mendenhall and Beaver 1991).
s
1 in
∑ ( x i x )2
n 1 i1
(6.11)
In our example of the sample with five realized values, 2, 6, 8, 9, 9, and
11, the variance of the sample is 9.9 and the standard deviation of the
sample is 3.15.
Percentile, skewness, and kurtosis
As before, we focus on the discrete case for simplicity. The p percentile
is a value such that p% of the observations in a sample have a value less
than this value.
The skewness gives an indication of the deviation from symmetry of a
density function (Rice 1988).
Skew 1
n
in
∑ ( x i x )3
i1
s3
(6.12)
121
122
Data Mining and Market Intelligence
The kurtosis characterizes tails of a distribution (Rice 1988)
1
n
Kurtosis in
∑ i=1 (xi x )4
(6.13)
s4
In our example of the sample with five realized values, 2, 6, 8, 9, 9, and
11, the skewness of the sample is 0.642 and the kurtosis of the sample
is 1.84.
We often use the normal density as a reference to characterize the tails’
size by defining the excess kurtosis. Since the kurtosis of the normal density function is equal to three, the excess kurtosis is given by
iN
Excess kurtosis ∑ i1
( xi )4
N4
3
(6.14)
Probability density functions
The probability density function defines the distribution of probability
among different realizable values of a random variable. This section gives
an overview of probability density functions of eight commonly used
data distributions: uniform, binomial, Poisson, exponential, normal, chisquare, Student’s t, and F distributions.
Uniform distribution
A random variable with uniform distribution has a constant probability
density function. If a and b are the minimum and maximum values the
random variable can take, the uniform density function is
f ( x) 1
( b a)
The expectation and variance of x are given by
2 ab
2
( b a )2
12
(6.15)
Data and Statistics Overview
Normal distribution
The probability density of a normally distributed random variable is
f (x ) 1
2
2
e(x ) /2
2
(6.16)
where = x = , is the expectation, and is the standard deviation.
The normal density can be standardized by rescaling the random variables as follows.
Z
x
(6.17)
The density function of Z is normal with zero expectation and unit variance. The notation Z ~ N(0,1) is used to denote that Z is a random variable drawn from a standardized normal distribution.
Binomial distribution
The probability density function of a binomial random variable is as follows (Mendenhall and Beaver 1991).
f (x ) N!
p x qNx
x ! (N x )!
(6.18)
A binomial event has only two outcomes. For example, an undertaking
whose outcome can be described by success or failure can be characterized by a binomial distribution. The integer value x is the number of successes in a total of N trials where 0 x N. The success outcome has a
probability p and the failure outcome has a probability of q 1 p.
The expectation and variance of a binomial random variable are
Np
2 Np(1 p)
Poisson distribution
A Poisson distribution characterizes the number of occurrences of an
event in a given period of time. This distribution is appropriate when
there is no memory affecting the likelihood of the number of events from
period to period. The probability density function of a Poisson distribution is as follows.
123
124
Data Mining and Market Intelligence
x e
x!
f (x ) (6.19)
The variable x represents the number of event occurrences during a given
period of time, during which on average events occur. Both the expectation and the variance of x are equal to .
Exponential distribution
Exponential random variables characterize inter-arrival times in Poissondistributed events. The probability density function of an exponential distribution is as follows.
f (x ) ex
where (6.20)
0, and the expectation and variance are given by
n
2 1
1
2
The exponential distribution reflects absence of memory in the interarrival times of Poisson-driven events.
Chi-square ( 2) distribution
A chi-square density characterizes the distribution of the sum of independent standardized normally distributed random variables, Zi.
2
k
ik
∑ Zi2
i1
(6.21)
Here, k is the degree of freedom as well as the number of independent
standardized normal distribution variables.
The probability density function of the chi-square distribution is as
follows.
f (x ) e
x k
1
2 x2
k
⎛k⎞
2 2 ⎜⎜⎜ ⎟⎟⎟
⎝2⎠
The gamma function, , is defined as
(6.22)
Data and Statistics Overview
( ) ∫ t
1 et dt
(6.23)
0
and has the recursive property
( 1) ( )
(6.24)
The mean of a chi-squared random variable is equal to k, and its
variance is
2 2k
Student’s t distribution
A Student’s t distribution describes the ratio, t, of a standardized normally
distributed random variable, Z1, and the square root of a 2 distributed
random variable, Z2, over its degrees of freedom (Gujarati, 1988).
t
Z n
Z1
1
Z2
Z2
n
(6.25)
The probability density function of t is
⎛ n 1 ⎞⎟
−( n1)
Γ⎜⎜
⎟
⎝ 2 ⎟⎠ ⎛⎜
x 2 ⎞⎟ 2
f (x ) ⎜1 ⎟⎟
⎛ n ⎞ ⎜⎝
n⎠
nπΓ ⎜⎜ ⎟⎟⎟
⎝2⎠
(6.26)
The mean of a t distribution is zero. Its variance is
2 k
k2
F distribution
The F distribution describes the ratio of two 2 distributed random variables Z1 and Z2,with k1 and k2 degrees of freedom, respectively
Z1
k
F 1
Z2
k2
(6.27)
125
126
Chapter 6
The probability density function is as follows.
f (x ) (k1 x )k1 k 2 k2
(k1 x k 2 )k1k2
⎛k k ⎞
xB ⎜⎜ 1 , 2 ⎟⎟⎟
⎝2 2⎠
(6.28)
⎛k k ⎞
k k
Here, B ⎜⎜ 1 , 2 ⎟⎟⎟ is the beta function with parameters 1 , 2 .
⎜⎝ 2 2 ⎠
2 2
The expectation and variance of an F-distributed variable are
2 k2
k2 2
for k 2
2k22 (k1 k2 2)
k1 (k2 2)2 (k2 4)
The variance does not exist when k2
2
for k2
4
(6.29)
4.
Independent and dependent variables
An independent variable is also called a predictive variable. Prediction in
this context means estimating the possible value of a dependent variable
with a given level of confidence. A dependent variable is also called an
outcome variable.
Covariance and correlation coefficient
Covariance measures the level of co-variability between two random variables. If X and Y are random variables, their covariance is defined by the
expression
cov(X ,Y ) E((x x )(y y )) E(xy ) x y
(6.30)
where x and y are the mean of X and Y, respectively.
A correlation coefficient between two variables X and Y gives an indication of the level of linear association between the two variables. There
are several standard formulations for level of association, of which the
Pearson correlation coefficient is the most popular. The Pearson correlation
coefficient,
Data and Statistics Overview
r
cov(X ,Y )
xy
(6.31)
measures the level of linear association between two random variables.
Here, x and y are the standard deviations of X and Y, respectively.
Besides the Pearson correlation coefficient, the Kendall’s coefficient of
concordance and the Spearman’s rank correlation coefficient are also commonly used measures of association for numeric variables. The Pearson’s
coefficient of mean square contingency and the Cramer ’s contingency
coefficient are used to measure association between nominal variables.
The Kendall-Stuart c, the Goodman–Kruskal , and Sommer ’s d are used
to measure association between ordinal variables (Liebetrau, 1983).
Kendall’s coefficient
Two pairs of observations (Xi,Yi) and (Xj,Yj) are said to be concordant if
(XiXj)(YiYj) 0. They are said to be discordant if (XiXj)(YiYj) 0.
They are said to be tied if (XiXj)(YiYj) 0.
Kendall’s coefficient of concordance is defined as
where
c
d
(6.32)
d
is the probability of concordance
c
and
c
P[(Xi X j )(Yi Yj )
0]
is the probability of discordance
d
P[(Xi X j )(Yi Yj ) 0]
Given the probability of ties,
t
P[(Xi X j )(Yi Yj ) 0]
the following condition must be satisfied
c
d
t
1
The following are two alternative estimations of the Kendall coefficient
of concordance (Liebetrau, 1983).
(C D) 2(C D)
⎛ n ⎞⎟
n(n 1)
⎜⎜ ⎟
⎜⎝ 2 ⎟⎠
(6.33)
127
128
Data Mining and Market Intelligence
where C is the number of concordant pairs, D is the number of discordant
pairs, and n is the total number of pairs.
An alternative expression (Liebetrau, 1983) is
( C D)
⎫
⎧⎪ ⎡⎛ n ⎞
⎪⎨ ⎢⎜ ⎟⎟ U ⎤⎥ ⎡⎢⎛⎜ n ⎞⎟⎟ V ⎤⎥ ⎪⎪⎬
⎜
⎜
⎥⎦ ⎪⎪
⎥⎦ ⎢⎣⎝⎜ 2 ⎠⎟
⎪⎪⎩ ⎢⎣⎝⎜ 2 ⎠⎟
⎭
1
(6.34)
2
⎛n ⎞
⎛m ⎞
where U ∑ ⎜⎜⎜ i ⎟⎟⎟, V ∑ ⎜⎜ j ⎟⎟⎟ , and mi is the number of observations
⎜2 ⎠
⎝2 ⎠
i
j ⎝
in the ith set of the X variables, and nj is the number of observations in
the jth set of the Y variables.
Spearman’s rank correlation coefficient is similar to Pearson’s except
that the former is based on ranks rather than on values. Ranks are determined by the relative values of the numbers in a series. In a series of N
numbers, the largest number has a rank of one, the second largest number
has a rank of two, and the smallest number has a rank of N.
∑ i1 (Ri R)(Si S )
n
b {∑
n
(Ri
i1
R)2 ∑ i1 (Si S )2
n
}
1
2
(6.35)
Where Ri is the rank of Xi among the X s, Si is the rank of Yi among the
Y s, and n is the total number of pairs.
Both Pearson’s coefficient of mean square contingency and Cramer ’s
contingency coefficient are based on the following chi-square statistics of
a contingency table.
I
J
x2 ∑ ∑
i1 j1
(nij npij )2
npij
(6.36)
Contingency tables display frequency data in a two-by-two cross tabulation and are used by researchers to examine the independence of two
methods of classifying the data. For instance, a group of individuals can be
classified by whether they are married and whether they are employed.
In this case marriage and employment are the two methods of classifying
the individuals. Figure 6-1 shows a typical contingency table.
Pearson’s coefficient of mean square contingency is a statistic used to
measure the deviation of the realized counts from the expected counts for
determining the independence of the two classification methods. The formula for the Pearson’s coefficient is as follows (Liebetrau 1983).
Data and Statistics Overview
Y
1
2
1
2
n11
.
.
.
J
Totals
n12
n1j
n1
p11
p12
p1j
p1
n21
n22
n2j
n2
p21
p22
p2j
p2
.
X
.
.
I
Totals
ni1
ni2
nij
ni
pi1
pi2
pij
pi
n1 n2
nj
n n
p1 p2
pj
p 1
Figure 6-1
Visualization of a two-way contingency table (Liebetrau 1983).
I
J
2 ∑ ∑
i1 j1
( pij pi p+ j )2
pi+ p+ j
I
J
∑∑
i1 j1
pij 2
pi pj
1
(6.37)
Cramer ’s contingency coefficient, given by Eq. (6.38), measures the
association between two variables as a percentage of their maximum possible variation.
⎛ φ 2 ⎞⎟1 / 2
⎟
ν ⎜⎜⎜
,
⎜⎝ q 1 ⎟⎟⎠
(6.38)
where q is min {I, J}.
The Kendall-Stuart c, the Goodman–Kruskal , and Somers’ d are
statistics that are derived from Kendall’s .
129
130
Data Mining and Market Intelligence
The Kendall-Stuart c equals the excess of concordant over discordant
pairs times another term representing an adjustment for the size of the
table. Goodman and Kruskal’s gamma is a symmetric statistic that ranges
from 1 to 1, based on the difference between concordant pairs and
discordant pairs. Somers’ d is Goodman and Kruskal’s gamma modified
to penalize for pairs tied only on X.
Kendall-Stuart coefficient
c
2q(C D)
2(C D)
2
[n2 (q 1)/ q]
n (q 1)
(6.39)
Goodman–Kruskal’s coefficient
c 1
d
t
c
d
1 ∑
I
i1
pi2
∑
c d
J
p 2
j1 j
∑ i1 ∑ j1 pij 2
I
J
(6.40)
Somers’ coefficient
dY ,X d
1
tXY
c
X
t
c
c
d
d
Y
t
c
1 ∑
I
i1
d
pi2
(6.41)
Here, tX is the probability that two randomly selected observations are
tied only on X, and tXY is the probability that two randomly selected
observations are tied on both X and Y.
Tests of significance
A significance test quantifies the statistical significance of hypotheses. We
will follow the paradigm established by Neyman-Pearson to posit significance tests.
In establishing a significance test, the probability distributions are
grouped into two aggregates, one of which is called the null hypothesis,
denoted by H0, and the other of which is called the alternative hypothesis, denoted as HA (Rice 1988). Null hypotheses often specify, or partially
specify, the value of a probability distribution (Rice 1988). The acceptance
area is the area under the probability density curve of the distribution
specified by the null hypothesis. The rejection area is the area under the
probability density curve of the distribution specified by the alternative
hypothesis.
There are two types of significance tests: one-tailed and two-tailed.
A one-tailed test is one that specifies the rejection area under the only tail
Data and Statistics Overview
of the probability density curve of the test statistics. A two-tailed test is one
that specifies the rejection area under the two tails of probability density
curve of the distribution of the test statistics (Mendenhall and Beaver
1991). For instance, a null hypothesis may state that the probability of getting ten successes in one hundred trials is 0.1. The alternative hypothesis
in a one-tailed test may state that the probability of getting ten successes
in one hundred trials is less than 0.1. A two-sided alternative hypothesis may state that the probability of getting ten successes in one hundred
trails is less than 0.1 or greater than 0.1.
According to the Neyman-Pearson paradigm, a decision as to whether
or not to reject H0 in favor of HA is made on the basis of T(X), where X
denotes the sample values and T(X) is a suitable statistic (Rice 1988). This
decision is affected by the error tolerance, which is defined by either error
of type I or error of type II.
A type I error consists in rejecting H0 when it is true. The probability of
rejecting H0 when it is true is denoted as , called the significance level.
In a one-tailed test, the probability of T(X) exceeding the critical statistic
T*(X) is . The confidence level is (1 ) and the 100(1 ) percent confidence interval is T(X) T*(X). In a two-tailed test, the probability of T(X)
exceeding critical T*(X) is /2 and the probability of T(X) not exceeding
critical T*(X) is /2 .
A type II error occurs when we accept H0 when it is false. The probability
of accepting H0 when it is false is denoted as . The power of a test is 1.
Z Test
In a Z test, the test statistic T(X) is defined as follows,
Z
X
(6.42)
where is the mean of X, and is the standard deviation of X, where X is
assumed to follow a normal distribution.
A Z table shows the critical Z scores at a pre-specified significance level
. If we assume a pre-specified of 0.05 and a two-tailed test, then the
shaded area under the probability density curve is 0.5 subtracted by /2 .
This area, 0.475 (or 47.5%), corresponds to a Z score of 1.96, as shown in
Figure 6-2.
t Test
In a t test, T(X) is called a t score and is a statistic used to determine
whether a null hypothesis can be rejected. When we conduct a t test,
we assume that the data has a Student’s t distribution. The t score is
given by
131
132
Data Mining and Market Intelligence
0 z
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
0.00
0.0000
0.0398
0.0793
0.1179
0.1554
0.1915
0.2257
0.2580
0.2881
0.3159
0.3413
0.3643
0.3849
0.4032
0.4192
0.4332
0.4452
0.4554
0.4641
0.4713
0.4772
0.4821
0.4861
0.4893
0.4918
0.4938
0.4953
0.4965
0.4974
0.4981
0.4987
0.01
0.0040
0.0438
0.0832
0.1217
0.1591
0.1950
0.2291
0.2611
0.2910
0.3186
0.3438
0.3665
0.3869
0.4049
0.4207
0.4345
0.4463
0.4564
0.4649
0.4719
0.4778
0.4826
0.4864
0.4896
0.4920
0.4940
0.4955
0.4966
0.4975
0.4982
0.4987
0.02
0.0080
0.0478
0.0871
0.1255
0.1628
0.1985
0.2324
0.2642
0.2939
0.3212
0.3461
0.3686
0.3888
0.4066
0.4222
0.4357
0.4474
0.4573
0.4656
0.4726
0.4783
0.4830
0.4868
0.4898
0.4922
0.4941
0.4956
0.4967
0.4976
0.4982
0.4987
0.03
0.0120
0.0517
0.0910
0.1293
0.1664
0.2019
0.2357
0.2673
0.2967
0.3238
0.3485
0.3708
0.3907
0.4082
0.4236
0.4370
0.4484
0.4582
0.4664
0.4732
0.4788
0.4834
0.4871
0.4901
0.4925
0.4943
0.4957
0.4968
0.4977
0.4983
0.4988
0.04
0.0160
0.0557
0.0948
0.1331
0.1700
0.2054
0.2389
0.2704
0.2995
0.3264
0.3508
0.3729
0.3925
0.4099
0.4251
0.4382
0.4495
0.4591
0.4671
0.4738
0.4793
0.4838
0.4875
0.4904
0.4927
0.4945
0.4959
0.4969
0.4977
0.4984
0.4988
0.05
0.0199
0.0596
0.0987
0.1368
0.1736
0.2088
0.2422
0.2734
0.3023
0.3289
0.3531
0.3749
0.3944
0.4115
0.4265
0.4394
0.4505
0.4599
0.4678
0.4744
0.4798
0.4842
0.4878
0.4906
0.4929
0.4946
0.4960
0.4970
0.4978
0.4984
0.4989
0.06
0.0239
0.0636
0.1026
0.1406
0.1772
0.2123
0.2454
0.2764
0.3051
0.3315
0.3554
0.3770
0.3962
0.4131
0.4279
0.4406
0.4515
0.4608
0.4686
0.4750
0.4803
0.4846
0.4881
0.4909
0.4931
0.4948
0.4961
0.4971
0.4979
0.4985
0.4989
0.07
0.0279
0.0675
0.1064
0.1443
0.1808
0.2157
0.2486
0.2794
0.3078
0.3340
0.3577
0.3790
0.3980
0.4147
0.4292
0.4418
0.4525
0.4616
0.4693
0.4756
0.4808
0.4850
0.4884
0.4911
0.4932
0.4949
0.4962
0.4972
0.4979
0.4985
0.4989
0.08
0.0319
0.0714
0.1103
0.1480
0.1844
0.2190
0.2517
0.2823
0.3106
0.3365
0.3599
0.3810
0.3997
0.4162
0.4306
0.4429
0.4535
0.4625
0.4699
0.4761
0.4812
0.4854
0.4887
0.4913
0.4934
0.4951
0.4963
0.4973
0.4980
0.4986
0.4990
0.09
0.0359
0.0753
0.1141
0.1517
0.1879
0.2224
0.2549
0.2852
0.3133
0.3389
0.3621
0.3830
0.4015
0.4177
0.4319
0.4441
0.4545
0.4633
0.4706
0.4767
0.4817
0.4857
0.4890
0.4916
0.4936
0.4952
0.4964
0.4974
0.4981
0.4986
0.4990
Figure 6-2
Z table.
t
XX
,
s
(6.43)
where X is the mean of X and s is the standard deviation of X.
A t table shows the critical t score at a prespecified significance level ,
parametrically in the number of degrees of freedom. If we assume a prespecified of 0.05 and a two-tailed test, then the shaded area under the
Data and Statistics Overview
probability density curve is 0.5 subtracted by /2 . To identify the critical t value, we need two pieces of information, the degrees of freedom,
n1, and the pre-specified significance level . If we assume that the total
number of observations in the sample is 30 and is 0.05, the number of
degrees of freedom of the sample is 29 and /2 is 0.025. With this information, in Figure 6-3 we identify the critical t value as 2.0452.
t(p,df)
df \p
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
inf
Figure 6-3
t table.
0.4
0.3249
0.2887
0.2767
0.2707
0.2672
0.2648
0.2632
0.2619
0.2610
0.2602
0.2596
0.2590
0.2586
0.2582
0.2579
0.2576
0.2573
0.2571
0.2569
0.2567
0.2566
0.2564
0.2563
0.2562
0.2561
0.2560
0.2559
0.2558
0.2557
0.2556
0.2533
0.25
1.0000
0.8165
0.7649
0.7407
0.7267
0.7176
0.7111
0.7064
0.7027
0.6998
0.6974
0.6955
0.6938
0.6924
0.6912
0.6901
0.6892
0.6884
0.6876
0.6870
0.6864
0.6858
0.6853
0.6849
0.6844
0.6840
0.6837
0.6834
0.6830
0.6828
0.6745
0.1
3.0777
1.8856
1.6377
1.5332
1.4759
1.4398
1.4149
1.3968
1.3830
1.3722
1.3634
1.3562
1.3502
1.3450
1.3406
1.3368
1.3334
1.3304
1.3277
1.3253
1.3232
1.3212
1.3195
1.3178
1.3163
1.3150
1.3137
1.3125
1.3114
1.3104
1.2816
0.05
6.3138
2.9200
2.3534
2.1318
2.0150
1.9432
1.8946
1.8595
1.8331
1.8125
1.7959
1.7823
1.7709
1.7613
1.7531
1.7459
1.7396
1.7341
1.7291
1.7247
1.7207
1.7171
1.7139
1.7109
1.7081
1.7056
1.7033
1.7011
1.6991
1.6973
1.6449
0.025
12.7062
4.3027
3.1825
2.7765
2.5706
2.4469
2.3646
2.3060
2.2622
2.2281
2.2010
2.1788
2.1604
2.1448
2.1315
2.1199
2.1098
2.1009
2.0930
2.0860
2.0796
2.0739
2.0687
2.0639
2.0595
2.0555
2.0518
2.0484
2.0452
2.0423
1.9600
0.01
31.8205
6.9646
4.5407
3.7470
3.3649
3.1427
2.9980
2.8965
2.8214
2.7638
2.7181
2.6810
2.6503
2.6245
2.6025
2.5835
2.5669
2.5524
2.5395
2.5280
2.5177
2.5083
2.4999
2.4922
2.4851
2.4786
2.4727
2.4671
2.4620
2.4573
2.3264
0.005
63.6567
9.9248
5.8409
4.6041
4.0321
3.7074
3.4995
3.3554
3.2498
3.1693
3.1058
3.0545
3.0123
2.9768
2.9467
2.9208
2.8982
2.8784
2.8609
2.8453
2.8314
2.8188
2.8073
2.7969
2.7874
2.7787
2.7707
2.7633
2.7564
2.7500
2.5758
0.0005
636.6192
31.5991
12.9240
8.6103
6.8688
5.9588
5.4079
5.0413
4.7809
4.5869
4.4370
4.3178
4.2208
4.1405
4.0728
4.0150
3.9651
3.9216
3.8834
3.8495
3.8193
3.7921
3.7676
3.7454
3.7251
3.7066
3.6896
3.6739
3.6594
3.6460
3.2905
133
134
Data Mining and Market Intelligence
Experimental design
The main objective of experimental design is to ensure the validity of the
conclusions from a study or survey. Experimental design is used to avoid
study or survey design flaws that may skew the results.
An experimental design is a process that seeks to discover the functional forms that relate the independent (predictive) variables and the
dependent (outcome) variables in a study (Green, Tull, and Albaum 1988).
Depending on the level of information available, an experimental
design aims to accomplish any of the following tasks.
●
●
●
Getting numeric parameter estimates only if the statistical function
form is already known
Building a model if the statistical function form is unknown
Identifying relevant variables (independent and dependent) if the statistical functional form is known but the variables are unknown.
Experimental design terminology
The following is a list of frequently used terms in experimental design
(Green, Tull, and Albaum 1988).
●
●
●
●
●
●
●
●
●
Units: Units are individuals, subjects or objects.
Treatments: Treatments are the independent (or predictive) variables
in an experimental design, calibrated to observe potential causality.
Control units: These are objects, individuals, or subjects that are not
subjected to any treatment. A group that consists of control units is
called a control group.
Test units: These are objects, subjects, or individuals that are given a particular treatment. A group that consists of test units is a treatment or test
group.
Natural experiment: An experiment that requires minimum intervention
and no calibration of variables.
Controlled experiment: An experiment that requires an investigator ’s
intervention and calibration of variables to discover a causality effect.
Two kinds of interventions are necessary: random placement of subjects into a control or a treatment group, and calibration of at least one
assumed to be a causal variable. Experiments that meet both intervention conditions are called true experiments.
Quasi-experiment: Experiments that contain manipulation of at least
one assumed causal variables, but do not have random assignment of
subjects into control or experiment group.
Block: A block is a group of similar units of which roughly equal numbers of units are assigned to each treatment group.
Replication: Replication is the creation of repeated measurements in a
control or treatment group.
Data and Statistics Overview
●
●
●
●
●
Completely randomized design: This is a design where test units are
assigned experimental treatments on a random basis.
Full factorial design: This type of design assigns an equal number of
observations to all combinations of the treatments involving at least
two levels of at least two variables.
Latin square design: Latin square design is a technique for reducing
the number of observations required in a full factorial design.
Cross-over design: In this design, different treatments are applied to
the same test unit in different periods of time.
Randomized-block design: This design is usually used when a researcher
needs to eliminate a possible source of error. In this case, each test unit
is regarded as a ‘block’ and all treatments are applied to each of these
blocks.
■ References
Green, P.E., D.S. Tull, and G. Albaum. Research for Marketing Decisions, 5th ed.
Prentice Hall, Englewood Cliffs, New Jersey, 1988.
Gujarati, D.N. Basic Econometrics, 2nd ed. McGraw-Hill, New York, 1988.
Liebetrau, A.M. Measures of Association. Quantitative Applications in the Social
Sciences, Sage Publications, Thousand Oaks, CA, 1983.
Mead, R. The Design of Experiments – Statistical Principles for Practical Application.
Cambridge University Press, New York, 1988.
Mendenhall, W., and R. Beaver. Introduction to Probability and Statistics, 8th ed.
PWS-Kent Publishing Company, Boston, MA, 1991.
Rice, J.A. Mathematical Statistics and Data Analytics, Statistics/Probability Series.
Wadsworth & Brooks/Cole, Belmont, CA, 1988.
135
This page intentionally left blank
CHAPTER 7
Introduction
to Data
Mining
This page intentionally left blank
A wide variety of data mining approaches have been developed to address
a broad spectrum of business problems. Techniques such as logistic regressions are used for building targeting models, and approaches such as
association analysis are used for building cross-sell or up-sell models.
Effective use of data mining to identify potential revenue opportunities
along the sales pipeline may result in higher returns on investment and
the creation of a competitive advantage.
The objective of this chapter is to introduce the fundamentals of the
most commonly used data mining techniques. Chapters 8–10 discuss several case studies based on some of these techniques.
■ Data mining overview
We define data mining in terms of
●
●
The use of statistical or other analytical techniques to process and analyze raw data to find meaningful patterns and trends
The extraction and use of meaningful information and insight to produce actionable business recommendations and decisions
The focal point of effective data mining is to analyze data in order to
make actionable business recommendations. Without the latter, data mining is an intellectual exercise with no real life application. In our experience, insufficient focus on actionable business recommendations is often
the main reason that data mining may not have been as widely adopted
by some organizations as would have been desirable.
Data mining can be applied to the solution of a broad range of business
problems. The following is a list of standard applications of data mining
techniques.
●
Development of customer segmentation
Customer segmentation and profiling analysis constitutes the first
step toward understanding the target audience. Understanding of the
target audience drives effective advertising, offers, and messaging.
As we pointed out in Chapter 5, marketing plan objectives determine
a variety of segmentation types. For example, if the marketing plan
calls for the creation of segments with differentiated needs, then needbased segmentation is required. Common segmentation types are:
– Need-based segmentation
– Demographics-based segmentation
– Value-based segmentation
– Product purchase-based segmentation
– Profitability-based segmentation
140
Data Mining and Market Intelligence
●
●
●
●
●
●
●
●
●
Customer profiling
Profiling analysis creates descriptions of segments by their unique
characteristics and attributes. For instance, a segment profile may consist of attributes such as an age range of thirty-five and older, and an
annual household income of $75,000 or higher.
New customer acquisition
New customer acquisition can be costly. Predictive targeting models
built with data mining techniques allow us to effectively target prospects with the greatest propensity of converting to customers.
Minimization of customer attrition or churn
Attrition among existing customers results in immediate revenue
loss. Data mining can be used to predict future attrition and mitigate
customer defection by understanding the factors responsible for
attrition.
Maximization of conversion
Increase in conversions of responders to leads and leads to buyers
can expedite the sales process. High conversion is one of the keys to
high investment returns. Data mining can help understand the primary conversion drivers.
Cross-selling and up-selling of products
Experience shows that it is much more expensive to generate revenues from new customers than from existing customers. The main
reason is that building relationships and trust is costly and time-consuming. Data mining can be used to build models for quantifying
additional products and services sales to existing customers.
Personalization of messages and offers
Personalized messages and offers tend to solicit higher response rates
from the target audience than generic ones. Data mining techniques
such as collaborative filtering can be used to create real-time, personalized offers and messages.
Inventory optimization
Data mining facilitates the determination of more accurate forecasts
of inventory needs, avoiding unnecessary waste due to over or under
stockpiling of inventory.
Forecasting marketing program performance
It is a common need of firms to forecast revenues, responses, leads,
and web traffic. Data mining techniques such as time series and multivariate regression analyses can be applied to address such forecasting
needs.
Fraud detection
The federal government of the United States was an early adopter of
data mining technology. As part of the investigation of the Oklahoma
bombing crime in 1995, the FBI used data mining analysis to sift
through thousands of reports submitted by agents in the field looking
Introduction to Data Mining
for connections and possible leads (Berry and Linoff 1997). In the
credit industry, customer fraud can cause significant financial damage
to lenders. Predictive modeling can be applied to address this issue by
modeling the probability of fraud at an individual level.
■ An effective step by step data
mining thought process
Figure 7-1 illustrates a step-by-step thought process for data mining.
Step 1
Identify business
objectives and
goals
Step 5
Identify data
sources
Step 2
Step 3
Determine key
business areas
and metrics to
focus
Translate business
issues to technical
problems
Step 6
Perform analysis
Step 4
Select appropriate
data mining
techniques and
software tools
Step 7
Translate
Analytical results
into actionable
business
recommendations
Figure 7-1
Effective data mining thought process.
Step one: identification of business
objectives and goals
The first step of the process is to identify the objectives and the goals of
a marketing effort. Objectives are something defined at a more abstract
level and in a less quantitative manner than goals, which are usually
quantifiable. For example, a business objective may be to increase sales
of the current fiscal year and the goal may be to increase the sales of the
141
142
Data Mining and Market Intelligence
current fiscal year by 15% given the same amount of investment as that of
the last fiscal year.
Step two: determination of the key focus
business areas and metrics
Once the objectives and goals of a marketing effort have been identified,
the next step is to determine on which business areas to focus and what
metrics to use for measuring returns. For instance, incremental marketing
returns may come from the existing customer base, from new customers, from an increase in the efficiency of marketing operations, or from a
reduction in the number of fraud cases.
Consider an online publisher whose main revenue source is advertising. Advertising revenue is based on cost per thousand impressions
(CPM). The cost of having an ad exposed to the one million impressions
is $1000 at a CPM of $1. Assume the publisher has as an objective to
increase traffic (impressions) to his web site and a goal of increasing his
advertising revenue by 15% over the next three months. There are several
business areas that the publisher can focus on to accomplish his goal. The
following are four examples of marketing efforts that the publisher may
consider.
●
●
●
●
An increase in his investment in search marketing to drive traffic to
his site
Advertisement of his site via online banners on others’ sites
Launching a promotional activity such as a sweepstake on the radio to
drive traffic to his site
Deployment of an online blog on his site to increase traffic volume.
The choices are numerous and the publisher needs to select a main focus
by assessing the advantages and disadvantages of each option.
The metrics used to measure the success of a marketing effort need
to be consistent with the business goals. In this example, if the goal is to
increase advertising revenue by 15% over the next three months then the
appropriate return metric is clearly the advertising revenue.
Step three: translation of business issues
into technical problems
The third step in an effective data mining thought process is the translation of business issues into technical problems. Wrong translation will
lead to waste of resources and opportunities. Continuing with the online
Introduction to Data Mining
publisher example: If the focus business area is to advertise on other web
sites to promote his own, the publisher needs to determine which of those
sites are most appropriate. As an example of a technical answer to the
question of where the publisher can place his advertisement, consider the
following approach.
The publisher can segment the traffic to his web site by referral sources.
A referral site or source is the origination site that leads particular visitors to the publisher ’s site. Some visitors may have arrived at the publisher ’s site via a Google search or a Yahoo search. In this case, either the
Google or the Yahoo search is the referral source. The publisher can then
determine which of the referral sources are the best traffic sources based
on traffic volume, traffic growth, visitor profile, and cost. He can then
emphasize investment in the most effective referral sources and actively
look for new referral sources with similar characteristics. All of the above
analysis requires a web analytic tool. Therefore, expertise in the selection,
deployment, and creation of reports from a web analytic tool needs to be
brought into the analysis process.
Step four: selection of appropriate data mining
techniques and software tools
Data mining techniques are based on analytic methods or algorithms,
such as logistic regressions, and decision trees. Data mining software is
an application that implements data mining techniques such that the user
does not need to write the data mining algorithms he uses. SAS Enterprise
Miner, IBM Intelligent Miner, SPSS Clementine, and Knowledge STUDIO
are examples of data mining software.
Step five: identification of data sources
Once the appropriate data mining technique and software have been
established, proper data sources need to be identified to effectively
leverage data mining. For example, historical customer purchase data is
required for conducting cross-sell or up-sell analysis. The following is a
list of common data sources for data mining (Rud 2001).
●
Internal sources
– Customer databases
– Transaction databases
– Marketing history databases
– Solicitation mail and phone tapes
– Data warehouse
143
144
Data Mining and Market Intelligence
●
External sources
– Third-party list rentals
– Third-party data appends
Additional data sources are:
●
●
●
●
●
●
●
●
●
●
●
●
Enterprise resource planning (ERP) systems
Point of sales (POS) systems
Financial databases
Customer relationship management (CRM) systems
Supply chain management (SCM) systems, such as SAP and People Soft
Marketing research and intelligence databases
Campaign management systems
Advertising servers
E-mail delivery systems
Web analytic systems
Web log files
Call center systems
After the appropriate data sources are identified, it is essential for make
sure that the data is cleansed and standardized in preparation for data
mining analysis.
Step six: conduction of analysis
There are three stages in data mining analysis: modeling building, model
validation, and real life testing. A comprehensive analysis must include
all three stages. Skipping the model validation and real life testing stages
may increase the risk of rolling out unstable models. The data needs to
be divided into two subsets, one for model building and the other for
validation.
Model building
A subset of the available data is used for model building. A common
practice is to use 50–70% of the data for model building and the remaining data for model validation. In general, several models are built and the
best ones are chosen based on statistics measuring model effectiveness. If
the R-squared (R2) statistic is used to evaluate the effectiveness of multiple regression models, then the model with the highest R2 is selected.
Model validation
After a subset of the data has been used for model building, the remaining data is used for validation. One common mistake is to build models
and validate them on the same set of data. This is a serious error that can
Introduction to Data Mining
artificially inflate the power of the models and make validation results
look much better than they actually are. It is very important to conduct
out-of-sample (sample referring to the subset of data used for model
building) validation. If a model works well with the validation test, it will
likely be successful in real-life tests.
Real-life testing
The best way to tell if a model works is to try it out with a small-scale real
life test. To test a targeting model, marketing promotions need to be sent
to both a control group and a test group. A control group consists of a random selection of all available prospects. A test group consists of the prospects that the model predicts are more likely to respond. A comparison in
the response rates between the test group and the control group provides
insights into the robustness of a targeting model. If the test group has a
higher response rate than that of a control and the result is statistically
significant, then the model is robust. If the test fails, close examination of
the model needs to be conducted to understand why the model does not
work in a real-life situation. Real-life testing is a roll out prerequisite.
Step seven: translation of analytical results into
actionable business recommendations
Inferring actionable business recommendations from model results
requires explaining the main conclusions of the analysis in nontechnical
terms. Throughout the various case studies from Chapter 8 to Chapter 10,
we provide specific examples on how to translate analytical results into
actionable business recommendations.
■ Overview of data mining
techniques
The foundations for the development of different data mining techniques
are statistical objectives and available data types. There are two common
statistical objectives, analysis of dependence and analysis of interdependence (Dillon and Goldstein 1984):
●
●
Analysis of dependence: This type of analysis is used to explore relationships between dependent variables and independent variables.
Analysis of interdependence: This method is used to explore relationships among independent variables.
The two common underlying data types of independent and dependent
variables are metric and nonmetric.
145
146
Data Mining and Market Intelligence
Basic data exploration
A preliminary data exploration is required before building any sophisticated data mining models. Occasionally, a basic data exploration is sufficient to address a business question. For instance, by regarding web
traffic as a time series, we may be able to identify visible spikes in traffic
pattern that coincide with particular marketing activities. By plotting one
customer attribute such as income against customer behavior such as grocery purchases, we might spot a distinct pattern of correlation between
income and grocery shopping. Data exploration should always be the
first step before building any model. Variables that appear interesting and
relevant at the data exploration stage will likely show up as significant
contributors in the final model.
Linear regression analysis
Regression analysis is a technique for quantifying the dependence between
dependent and independent variables. A particular type of regression
analysis, linear regression, is most frequently used in data mining. In its
more general formulation, linear regression establishes a linear relationship
between the dependent variables and the so-called regression parameters,
with the independent variables appearing in nonlinear functional forms.
A particularly popular form of linear regression occurs when the relationship between dependent and independent variables is itself linear. In this
case, linear regression with a single independent variable is called simple
linear regression, and regression with several independent variables is
called multiple linear regression.
The linear technique is widely used to predict a single dependent variable
(outcome variable) with one or multiple independent variables (predictive
variables). Linear regression problems are addressed with Ordinary Least
Squares (OLS) and Maximum Likelihood Estimation (MLE).
Simple linear regression
In simple linear regression, there is a single dependent variable and
a single independent variable. There is an implied linear relationship
between the two variables. Figure 7-2 is a graphical representation of this
relationship.
The mathematical formula for simple regression is
Yi 0 1 Xi i
(7.1)
where i 1, 2, 3, 4, …., n, Xi is the value of independent variable X, and Yi
is the value of dependent variable Y. 0 and 1 are regression parameters
and i is the error term. The number of degrees of freedom is n 1.
Introduction to Data Mining
Y
dy
(xi,yi)
dx
εi
dy
β1
dx
(0,β0)
X
Figure 7-2
Illustrating the linear relationship between X and Y.
The estimator of Yi is denoted as Ŷi and is called the regression line. If
ˆ0 is the estimator of 0 and ˆ1 is the estimator of 1, the regression line is
given by the following expression
Ŷi ˆ0 ˆ1 Xi
(7.2)
The estimator of 1 is (Neter, Wasserman, and Kutner 1990)
1 in
∑ i1 (Xi X )(Yi Y )
in
∑ i1 (Xi X )2
(7.3)
where X and Y are the means of X and Y, respectively.
The variance of ˆ1 is
var( 1 ) σ2
in
∑ i1 (Xi X )2
where is the variance of variable Y.
(7.4)
147
148
Data Mining and Market Intelligence
The estimator of 0 can be shown to be (Neter, Wasserman, and Kutner
1990)
ˆ0 Y ˆ1 X
(7.5)
The variance of ˆ0 is
in
s2 (ˆ0 ) 2 ∑ i1 Xi
2
(7.6)
in
n∑ i1 (Xi X )2
A significance test using t statistics can be applied to determine
whether ˆ0 and ˆ1 are statistically significant. The t statistic for ˆ1 is
t(ˆ0 ), expressed as ˆ0 / S(ˆ0 ). In the case where a 95% confidence level
is used for the test, if t(ˆ0 ) is greater than t0.025 , n2 (ˆ0 ) or less than
−t0.025 , n2 (ˆ0 ) , then ˆ is statistically significant. The 95% confidence
0
interval of ˆ0 is
⎡
in
X
⎢ˆ
∑
i1 i
⎢ 0 t0.025 , n2 , ˆ0 t0.025 , n2
in
⎢
2
(
)
n
X
X
∑ i1 i
⎢⎣
⎤
in
⎥
∑ i1 Xi
⎥
in
n∑ i1 (Xi X )2 ⎥⎥
⎦
The t statistic for ˆ1 is t(ˆ1) , expressed as
t(ˆ1 ) ˆ1
s(ˆ1)
(7.7)
In the case where a 95% confidence level is used for the test, if t(ˆ1) is
greater than t0.025 , n2 (ˆ1) or less than t0.025 , n2 (ˆ1) , then ˆ1 is statistically significant. The 95% confidence interval of ˆ is given by
1
⎡
⎛
⎜
⎢
⎢ ⎜⎜⎜ t
⎢ 1 ⎜ 0.025 , n2 ⎜⎜
⎢
⎝
⎢⎣
⎞⎟
⎛
⎜⎜
⎟⎟
⎟⎟ , 1 ⎜⎜ t0.025 , n2 ⎜⎜
in
⎟
⎜⎝
∑ i1 (Xi X )2 ⎟⎟⎠
σ
⎞⎟⎤
⎟⎟⎥⎥
⎟⎟⎥
in
2⎟
⎟⎟⎥
(
X
X
)
∑ i1 i
⎠⎥⎦
Introduction to Data Mining
Key assumptions of linear regression
Simple linear regression relies on four key assumptions that need to be
satisfied for conclusions to apply (Neter, Wasserman, and Kutner 1990).
Assumption 1: The mean of error term i, conditional on Xi is zero.
E( i | Xi ) 0
where i 1, 2, 3, 4, …, n.
Assumption 2: The covariance between the error terms, i s, is zero.
cov( i , j ) 0
where i j, i 1, 2, 3, 4, …, n, and j 1, 2, 3, 4, …, n.
Assumption 3: The variance of i is constant (a situation referred to as
homoscedasticity.)
var( i ) i 2 j 2 var( j )
where i j, i 1, 2, 3, 4, …, n, and j 1, 2, 3, 4, …, n.
Assumption 4: The covariance between Xi and i is zero, namely,
cov(Xi, i) 0
where i 1, 2, 3, 4, …, n.
Multiple linear regression
In multiple linear regression, there is a single dependent variable and
more than one independent variable. We can describe a multiple regression model with p 1 independent variables as follows.
Yi 0 1 X1i 2 X 2i 3 X 3 i p1 X p1, i i
(7.8)
for i 1, 2, 3, 4, …, n, and p 3.
We can also express a multiple regression equation in matrix form
(Dillon and Goldstein 1984).
⎛ Y1 ⎞⎟ ⎛⎜ 0 1 X11 2 X 21 3 X 31 p1 X p1, 1 1 ⎞⎟
⎟⎟
⎜⎜ ⎟ ⎜
⎟
⎜⎜ Y ⎟⎟ ⎜⎜ X X X X
1 12
2 22
3 32
p1 p1, 2
2 ⎟
⎟⎟
⎜⎜ 2 ⎟⎟⎟ ⎜⎜ 0
⎟⎟
.
⎜⎜ . ⎟⎟ ⎜⎜
⎟⎟
⎜⎜ . ⎟⎟ ⎜⎜
.
⎟⎟
⎜⎜ . ⎟⎟ ⎜⎜
⎟⎟
.
⎜⎜⎝ Y ⎟⎟⎠ ⎜⎜ X X X X
⎟
⎜⎝ 0
n
1 1n
2 2n
3 3n
p1 p1, n n ⎟
⎠
(7.9)
149
150
Data Mining and Market Intelligence
Equation 7.9 can be expanded to the following.
⎛ Y1 ⎞⎟ ⎛⎜⎜ 1 X11
⎜⎜ ⎟ ⎜
⎜⎜ Y ⎟⎟ ⎜⎜ 1 X
12
⎜⎜ 2 ⎟⎟⎟ ⎜⎜
⎜⎜ . ⎟⎟ = ⎜⎜ .
.
⎜⎜ . ⎟⎟ ⎜⎜ .
.
⎜⎜ . ⎟⎟ ⎜⎜ .
.
⎜⎝⎜ Y ⎟⎟⎠ ⎜⎜
n
⎜⎝ 1 X1n
X 21
X 31
.
.
.
.
.
X 22
.
.
.
X2n
X 32
.
.
.
X3 n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
X p1, 1 ⎞⎟ ⎛ 0 ⎞ ⎛ ⎞
⎟⎟ ⎜ 1 ⎟⎟
⎟⎟ ⎜⎜
⎟⎟ ⎜⎜ ⎟
⎟ ⎜⎜
⎟
X p1, 2 ⎟⎟ ⎜⎜ 1 ⎟⎟ ⎜⎜ 2 ⎟⎟⎟
⎟ ⎜ ⎟
. ⎟⎟⎟ ⎜⎜ . ⎟⎟⎟ ⎜⎜ . ⎟⎟
⎜
⎟
⎟⎟ ⎜⎜ . ⎟⎟⎟
.
. ⎟⎟ ⎜⎜
⎜
. ⎟⎟⎟ ⎜⎜ . ⎟⎟⎟ ⎜⎜ . ⎟⎟⎟
⎜
⎟
⎟
⎟
X p1, n ⎟⎠ ⎜⎝ p1 ⎠ ⎜⎝ n ⎟⎠
(7.10)
In matrix form, this can be written as
Y X (7.11)
The estimated parameter matrix is
(X ′X )1 X ′Y
(7.12)
where X and Y are matrices whose entries are the realized values of the
corresponding random variables.
The covariance matrix of the estimated matrix is
s2 ( ′) 1
(Y ′Y ′X ′Y )(X ′X )1
n p
(7.13)
The standard deviation matrix of the estimated matrix is s( ) , a diagonal matrix whose entries are the square root of the diagonal matrix elements of s2 ( ) .
The 95% confidence interval of the estimated matrix is as follows
[ t0.025 , np1 s( ), t0.025 , np1 s( )]
Goodness of fit measure R2 and the F statistic
The term R2 is also called the multiple coefficient of determination. R2
measures the total variance of the sample data explained by the regression
model, and is given by the ratio of the variance explained by the multiple
regression (Sum of Squares of Regression, or SSR) and the total variance
(Total Sum of Squares, or TSS). The values of R2 range from zero to one. The
difference between TSS and SSR is called Sum of Squares of Errors, or SSE.
The higher R2 is, the higher the fraction of the sample variance explained
by the linear regression model. R2 is given by the expression
R2 ∑ ( y y )2
∑ ( y i y )2
SSR
TSS SSE
SSE
1
TSS
TSS
TSS
(7.14)
Introduction to Data Mining
As the number of independent variables increases in a particular model,
the coefficient of determination tends to increase. When an increase in R2
is due to the increase in the number of independent variable rather than
the incremental explanatory power of the additional independent variables, the model power is inflated. To avoid adoption of a model with
inflated explanatory power, the adjusted R2 can be used instead of R2.
Adjusted R2 = 1 (1 R2 )
n1
n p1
(7.15)
where p is the number of independent variables.
The F statistic is another statistical measure of the robustness of multiple regression. The F statistic is obtained by dividing Mean Squared
Regression (MSR) by Mean Squared Error (MSE) (Neter, Wasserman, and
Kutner 1990). MSR is defined as SSR divided by the degrees of freedom,
p. MSE is SSE divided by its degrees of freedom, n p 1.
F
SSR/ p
MSR
MSE
SSE /(n p 1)
(7.16)
We reject the null hypothesis that all the parameter estimates are
zeros if the F statistic of a multiple regression model is greater than the
critical F value, F1– ,p,n–p–1 where is the significance level.
Additional regression techniques have been developed over the years
to facilitate selection of independent variables. These techniques include
backward, forward, and stepwise selection methods. Chapter 12 of the
book Applied Statistical Methods (Neter, Wasserman, and Kutner 1990)
has an in-depth discussion of these methods.
Cluster analysis
Cluster analysis is used to uncover interdependence between members
of a sample. In this context, by members we mean objects of study, such
as individuals or products, and by sample we mean the collection of
such individuals or products used for conducting the study. By member
attributes, we mean variables that describe the features and characteristics of members. For instance, age and income are member attributes of
individuals. Through cluster analysis, members with similar values in
the variables under analysis are grouped into clusters. Each member can
only belong to one cluster. Cluster analysis is widely used in customer
segmentation. Identification of members with similar characteristics is
the key to cluster analysis. The following section provides an overview
on how to measure similarity between members of a sample.
151
152
Data Mining and Market Intelligence
Measurement of similarity between sample members
Comprehension of the concept of similarity is the key to the understanding of cluster analysis. Various criteria, such as distance and correlation,
can be used to measure similarity between sample members. In this chapter, we will focus on distance as a similarity measure of members.
The Euclidean distance (Dillon and Goldstein 1984) measures the distance between two sample members, i and j, with p attributes. The shorter
the distance is, the more similar the two members are to each other.
If we assume that the realized values of the attributes of a sample
member i can be represented by vector Xi = (X1i , X 2i , X 3 i , … , X pi ) and
the
values of the attributes of member j can be represented by a vector
X j = (X1 j , X 2 j , X 3 j , … , X pj ) , then the Euclidean distance between the
two members is
d=
kp
∑ (Xki Xkj )2
(7.17)
k1
The Mahalanobis distance (Dillon and Goldstein 1984) is another method
of measuring distances between members, and has the advantage over the
Euclidean distance in that it takes into consideration correlation between
the attributes of the members. The Mahalanobis distance, m, is defined by
m = (Xi X j )′S1 (Xi X j )
(7.18)
where S is the covariance matrix of the member attributes.
Clustering techniques comprise hierarchical and partitioning methods
(Dillon and Goldstein 1984). Hierarchical methods, in their turn, can be
classified as either agglomerative or divisive.
Agglomerative methods start out by treating each member as a cluster
and then grouping members with particular similarities into clusters. Once
a member is assigned to a cluster, it cannot be reassigned to another cluster.
Divisive methods, commonly known as decision trees, begin by splitting the members into two or more groups. Each group can be further
split into two or more subgroups, with the splitting process continuing
until a preselected statistic reaches an assumed critical value.
Hierarchical agglomerative methods
There are four common approaches under hierarchical agglomerative
methods (Dillon and Goldstein 1984).
●
●
●
●
Nearest neighbor (single linkage)
Furthest neighbor (complete linkage)
Average linkage
Ward’s error of sum of squares.
Introduction to Data Mining
The nearest-neighbor approach defines the distance between a member
and a cluster of members as the shortest distance between the member and
the cluster. The furthest neighbor approach defines the distance between
a member and a cluster of members as the longest distance between
the member and the cluster. The average linkage approach defines the
distance between a member and a cluster of members as the average distance between the member and the cluster.
This distance can be any statistical distance measure, such as Euclidean
and Mahalanobis. The three approaches share the common rule that members and clusters that are close to one another are grouped into large clusters.
The nearest-neighbor approach begins by grouping the two members
with the shortest distance into a cluster. The approach next calculates the
distance between this cluster and each of the remaining members and
continues to group members and clusters that are closest to one another.
The process continues until a cluster containing all members is formed.
We next discuss an example of the nearest-neighbor method applied to
a set of five members (A, B, C, D, and E). Figure 7-3 shows a matrix whose
entries represent the distances between any two members.
A
B
C
D
E
A
0
4
65
12
9
B
4
0
45
34
10
C
65
45
0
17
22
D
12
34
17
0
12
E
9
10
22
12
0
Figure 7-3
Distance matrix illustration.
As we can see in Figure 7-3, the distance between members A and B is
the shortest one. Therefore, we start out by grouping these two members
into the first cluster. After members A and B are grouped into one cluster,
we calculate the distances between this cluster and the remaining members, as shown in Figure 7-4.
153
154
Data Mining and Market Intelligence
A
B
C
D
E
A
0
0
65
12
9
B
0
0
45
34
10
C
65
45
0
17
22
D
12
34
17
0
12
E
9
10
22
12
0
Figure 7-4
Distances after formation of the first cluster.
The nearest distance between the first cluster (AB) and each of the
remaining members and the distances between the remaining members
are as follows
(AB) and C:
(AB) and D:
(AB) and E:
C and D:
C and E:
D and E:
min(65, 45) 45
min(12, 34) 12
min(9, 10) 9
17
22
12
From this we infer that A, B, and E now form a cluster since nine is the
shortest distance. The distance between this new cluster and the remaining members are.
(ABE) and C:
(ABE) and D:
C and D:
min(65, 45, 22) 22
min(12, 34, 12) 12
17
Since 12 is the shortest distance, this means that A, B, E, and D
now form a cluster and C remains as a cluster of its own. In conclusion,
the nearest-neighbor method has created two final clusters. One cluster
consists of members A, B, D, and E, and the other cluster contains member C only.
Introduction to Data Mining
We next apply the furthest neighbor approach to the same set of members. The distance between members A and B is the shortest so these two
are grouped into the first cluster from the outset. After members A and
B are grouped into one cluster, the distances between the cluster and the
remaining members are calculated.
(AB) and C:
(AB) and D:
(AB) and E:
C and D:
C and E:
D and E:
max(65, 45) 65
max(12, 34) 34
max (9, 10) 10
17
22
12
From this we infer that A, B, and E now form a cluster since ten is the
shortest distance. The distance between the cluster and the remaining
members are
(ABE) and C:
(ABE) and D:
C and D:
max(65, 45, 22) 65
max(12, 34, 12) 34
17
This means that C and D form a cluster since seventeen is the shortest distance. In conclusion, the furthest neighbor method has created two
final clusters. One cluster consists of members A, B, and E, and the other
cluster contains members C and D.
We next apply the average linkage approach to the same member sample. Since the distance between members A and B is the shortest, these two
members are grouped into the first cluster from the outset. After members
A and B are grouped into one cluster, the distances between the cluster
and the remaining members are calculated and the results are as follows.
(AB) and C:
(AB) and D:
(AB) and E:
C and D:
C and E:
D and E:
Average(65, 45) 55
Average(12, 34) 23
Average(9, 10) 9.5
17
22
12
From this we conclude that A, B, and E now form a cluster since 9.5 is
the shortest distance. The distances between the cluster and the remaining members are
(ABE) and C:
(ABE) and D:
C and D:
Average(65, 45, 22) 44
Average(12, 34, 12) 19.3
17
155
156
Data Mining and Market Intelligence
Members C and D form a cluster since seventeen is the shortest distance. Members A, B, and E form the other cluster.
Ward’s Error Sum of Squares (Ward’s ESS) is a clustering approach that
creates clusters by minimizing the sum of the within-cluster variance. The
within-cluster variance is defined as (Dillon and Goldstein 1984)
⎛ nj
⎞⎟2 ⎞⎟⎟
1
⎜
⎜
ESS ∑ ⎜⎜ ∑ X 2ij ⎜⎜ ∑ Xij ⎟⎟⎟ ⎟⎟⎟
⎟⎠ ⎟⎟
n j ⎜⎝⎜ i=1
j1 ⎜
⎜⎝ i1
⎠
jk ⎛
⎜ nj
(7.19)
Here, Xij is the attribute value of member i in cluster j.
We next discuss an example where we apply the Ward’s approach to a
sample of four members (A, B, C, and D). From the outset, each member
forms its individual cluster. The attribute values of the four members are
as follows.
A: 4
B: 10
C: 5
D: 20
We first compute the Ward’s ESS for every possible cluster that can be
formed with two members in the sample.
ESS of members A and B = 4 2 10 2 12 ( 4 10)2 18
ESS of members A and C 4 2 52 12 ( 4 5)2 0.5
ESS of members A and D 4 2 20 2 12 ( 4 20)2 128
ESS of members B and C 10 2 52 12 (10 5)2 12.5
ESS of members B and D 10 2 20 2 12 (10 20)2 50
ESS of members C and D 52 20 2 12 (5 20)2 112.5
Members A and C form a cluster since the Ward’s ESS of their cluster is
the lowest. Next we compute the Ward’s ESS for any possible cluster that
consists of the (AC) cluster and each of the remaining members.
ESS of the cluster that consists of members A, C, and B is
4 2 10 2 52 31 ( 4 10 5)2 20.67
ESS of the cluster that consists of members A, C, and D is
4 2 52 20 2 31 ( 4 5 20)2 160.67
The Ward’s ESS for the cluster of members A, C, and B is smaller than
that of the clusters with members A, C, and D, and B and D. Therefore,
Introduction to Data Mining
members A, B and C form a cluster, and member D remains by itself in
one cluster.
Hierarchical divisive methods: AID, CHAID, and CART
Hierarchical divisive methods start out by splitting a group of members
into two or more subgroups and proceeds in the same splitting approach
based on predetermined statistical criteria. The most common divisive
methods are decision tree approaches such as Automatic Interaction
Detection (AID), Chi-Square Automatic Interaction Detection (CHAID),
and Classification and Regression Tree (CART).
A decision tree approach starts out at the root of a tree where all members reside and splits the members into different subgroups (called
branches or nodes). A tree is built in such a way that the variance of the
dependent variable is maximized between groups and is minimized
within groups. For instance, a group of consumers may be split into different age (independent variable) groups to maximize the variance of the
household income (dependent variable) between the age groups.
Figure 7-5 illustrates how a small decision tree looks like. There are two
splits in the tree. The nodes that stop splitting are called terminal nodes.
Nodes one, three, four, and five are terminal nodes.
AID is a divisive approach that splits a group of members into binary
branches. While in this approach the dependent variable needs to be
metric, the independent variables can be either nonmetric or metric.
Root
(Node 0)
Split 1
Node 1
Node 2
Node 3
Split 2
Node 4
Figure 7-5
Decision tree output illustration.
Node 5
157
158
Data Mining and Market Intelligence
Chi-Square Automatic Interaction Detection (CHAID) is more flexible than AID in that CHAID allows a group of members to be split into
two or more branches. Given its flexibility, CHAID is more widely used
in data mining than AID. The following are the key characteristics of
CHAID.
●
●
●
The dependent variable is usually nonmetric.
The independent variables are either nonmetric or metric data with no
natural zero value and no specific distribution constraints, but their
possible number of groups should be no more than 15. (Struhl 1992)
The chi square statistic is used to determine whether to further split
a node.
We next discuss an example where CHAID is used to better understand the member ’s attributes that are highly associated with the responsiveness to a direct mail promotion. The dependent variable, a binary
variable with a value of ‘Yes’ or ‘No’, denotes the existence or absence
of a response to the promotion. Assume there are 10,000 individuals in
the marketing sample. Among them, 6000 individuals respond and 4000
do not respond to the promotion. Therefore, the overall all response rate
to the direct mail promotion is 60%. Assume there are three responder
attributes (independent variables) of interest, age, income, and gender.
The CHAID approach assesses all the values of age, income, and gender
to create splits with significant chi square statistics.
Figure 7-6 illustrates the output of CHAID, where there are four terminal codes (nodes one, three, four, and five) in the tree. Age and gender are the drivers of responsiveness. Female subjects aged between 25
and 45 are the most responsive groups, followed by male subjects in the
same age group. Subjects aged over 45, regardless of gender, are the least
responsive individuals.
We next review another example of a CHAID application to marketing.
Assume a marketing manager at a gourmet cookware store wishes to find
out which types of customers are more likely to purchase a high-end BBQ
grill. The store has collected three pieces of demographic information
about its customers: household income, marital status of the household
head, and the number of children in the household. Assume the store has
five years of detailed transaction history, and there are a total of 25,000
customers in the store database. The dependent variable, existence of a
past purchase of a high-end grill, is a nominal variable with two possible values: purchase or no purchase. The three independent variables are
household income, marital status of the household head, and the number
of children in the household. CHAID is used to split the 25,000 customers into subgroups based on household income, the marital status of the
household head, and the number of children.
Introduction to Data Mining
Node 0
Total: 10,000
Responders: 6000
Non-responders: 4000
Response rate: 60%
Age
[25–45]
<25
Node 1
Total: 1300
Responders: 800
Non-responders: 500
Response rate: 62%
Node 2
Total: 6200
Responders: 4200
Non-responders: 2000
Response rate: 68%
>45
Node 3
Total: 2500
Responders: 1000
Non-responders: 1500
Response rate: 40%
Gender
Female
Node 4
Total: 4000
Responders: 3000
Non-responders: 1000
Response rate: 75%
Male
Node 5
Total: 2200
Responders: 1200
Non-responders: 1000
Response rate: 55%
Figure 7-6
CHAID analysis applied to direct mail promotion responses.
Once the marketing manager understands which customer attributes
are highly associated with the purchase of a BBQ grill, he can contact a list
broker to rent lists of prospects with similar attributes. These prospects
will form an ideal target audience for a BBQ grill marketing campaign.
Consider yet another example of a CHAID application to marketing.
Assume a high tech company specializing in enterprise networking security plans to launch an e-mail promotion. The promotion with a gift card
offer will target a group of technology magazine subscribers. The purpose
of the campaign is to acquire new customers who may currently be using
a competitor ’s product and are considering additional purchases of similar products. Assume that there are over 50,000 subscribers to the technology magazine. The gift card offer has a value of $50 per subscriber so the
offer cost will amount to $2.5 million if all the subscribers are targeted.
In other words, in this particular situation it would be very costly to target all of the subscribers. The marketing manager at the high tech firm
decides to segment the magazine subscriber list based on six attributes.
159
160
Data Mining and Market Intelligence
●
●
●
●
●
●
The industry that a subscriber works for.
The size of the company that a subscriber works for: For instance, the
company size is 5000 if a subscriber works for a company with 5000
employees.
Subscriber ’s office location type such as branch office, headquarters,
and single location.
Subscriber ’s role in network security purchase decision making such
as authorizing and influencing.
Subscriber job function such as information system, marketing and
accounting.
Subscriber ’s job title such as CTO, IT manager, and marketing VP.
The marketing manager then uses the CHAID approach to analyze his
internal database and to construct a profile of past security product buyers. The buyer profile indicates that networking security product buyers
tend to work for large companies (company size of 1000 or more) in the
banking industry. These buyers also tend to be IT managers and IT directors working at branch offices.
Based on the above profile, the marketing manager next instructs the
magazine publisher to select a targeted list of subscribers that are IT
managers or IT directors who work at the branch offices of large banks.
Assume the magazine company has a total of 2000 subscribers that meet
the selection criteria. The total gift card offer cost is 2000 multiplied by
$50, which amounts to $100,000. This cost is within the company’s program budget.
Classification and Regression Tree (CART) is another decision tree
approach. In this approach, both the dependent and the independent variable can be either metric or nonmetric (Struhl 1992). Like CHAID, CART is
also widely used in data mining for marketing. Although the original CART
algorithm allowed two-node splits only, there are now CART software
implementations with revised algorithms that offer the flexibility of creating
splits into more than two nodes. The following is a list of key characteristics
of CART (Breiman, Friedman, Olshen, and Stone 1998).
●
●
●
The dependent variable is metric or nonmetric with no specific distribution constraints.
The independent variables can be either nonmetric or metric with no
specific distribution constraints.
In cases where the dependent variable and independent variable are
both metric, the relationship between them can be either linear or
nonlinear.
When the dependent variable is metric, accuracy statistic measures
such as average least squares can be used to determine whether a node
continues to split.
Introduction to Data Mining
1
n
n
∑ (Yi Yˆ i )2
(7.20)
i1
Yi is observed value of dependent variable Y of member i, Ŷi is the predicted value of variable Y and n is the number of members in the node. A
node is split to minimize this particular statistic.
When the dependent variable is nonmetric, the misclassification rate
(the percentage of cases being misclassified) is used to determine whether
a node should split further. The optimal tree is the one that minimizes the
overall misclassification rate of all nodes in the tree (Breiman, Friedman,
Olshen, and Stone 1998).
In the next example, CART is used to better understand the drivers for
customer purchases of a product. Assume three independent variables are
analyzed: age, income, and gender. The dependent variable is the average
annual purchase amount of a customer. Age and income are metric variables and gender is a nonmetric variable. There are a total of 5000 customers and the average purchase amount of these customers is $300.
Figure 7-7 shows the final output of CART. There are three terminal
codes (nodes two, three, and four) in the tree, and the 5000 customers are
Node 0
Total: 5000
Average
Purchase: $300
$50,000
Income
$50,000
Node 4
Total: 1300
Average
Purchase: $585
Node 1
Total: 3700
Average
Purchase: $200
Age
30
Node 2
Total: 3000
Average
Purchase: $100
Figure 7-7
CART output illustration.
30
Node 3
Total: 700
Average
Purchase: $628
161
162
Data Mining and Market Intelligence
assigned into these three nodes. The customers in node one on average
have a purchase amount of $200. The customers in node two on average
have a purchase amount of $100 and those in node three on average have
a purchase amount of $628 and those in node four on average have a purchase amount of $585. CART has segmented the customers based on their
purchase volume.
Partitioning methods
Partitioning methods assume that the initial number of clusters is predetermined. Unlike the hierarchical methods we discussed earlier, partitioning
techniques allow for the reassignment of members from one cluster to
another.
One of the best-known partitioning techniques is the K-Means clustering method (Dillon and Goldstein 1984), which starts out by grouping the
members into K clusters. There are numerous ways of creating these initial K clusters. Members with close proximity are grouped into the same
clusters, and then are moved from one cluster to another to minimize the
error of partition.
If Xi,j,l is value of the jth attribute of member i in cluster l, X j , l is the
mean of jth attribute in cluster l, p is the number of attributes, and n is the
total number of members, the partition error is defined by
in jp
E ∑ ∑ ( X i , j , l X j , l )2
(7.21)
i1 j1
We next discuss an example on how to create clusters based on the
K-means approach. Assume there are six students (A, B, C, D, E, and F) with
scores in three subjects: English, math, and music (as indicated in Table 7-1).
The value of K is set to three. The Euclidean distances of scores between
the students are shown in Figure 7-8.
Table 7-1 K-means clustering example – student score raw data
Student
English score
Math score
Music score
A
B
C
D
E
F
60
100
55
98
70
98
90
85
70
65
80
100
78
90
40
95
44
78
Introduction to Data Mining
A
B
C
D
E
F
A
0
42
43
49
37
39
B
42
0
69
21
55
19
C
43
69
0
70
18
65
D
49
21
70
0
60
39
E
37
55
18
60
0
48
F
39
19
65
39
48
0
Figure 7-8
Student score distance matrix.
Three initial clusters are formed based on the distance matrix in Figure
7-9: A, (BDF), and (CE). We next compute the mean scores in English,
math, and music by cluster. These mean scores, shown in Table 7-2, are
used to derive the error of partition of the clusters.
The error of partition of the initial three clusters, as defined by
Eq. 7.21, is
E (100 98.7)2 (98 98.7)2 (98 98.7)2 (85 83.3)2 (65 83.3)2
(100 83.3)2 (90 87.7)2 (95 87.7)2 (78 87.7)2 (55 62.5)2
(70 62.5)2 (70 75)2 (80 75)2 (40 42)2 (44 42)2 942.51
New clusters are formed if E decreases as a result of moving one student from one cluster to another. The final cluster configuration is the one
with the lowest E.
Principal component analysis
Principal component analysis is a data reduction technique that can
reduce the number of variables under analysis. The technique creates new
variables called principal components that are linear combinations of the
original variables. Principal components are uncorrelated to one another.
We assume that there are m principal components (PC) derived from p
original variables. Each principal component can be expressed as a linear
combination of the original variables.
163
164
Data Mining and Market Intelligence
A B C D E
Frequency
table
1
2
3
4
A B C D E
1
2
3
4
A B C D E
1
2
3
4
Row
profiles
Column
profiles
1
D
A
B
2
C
4
3
E
1
D
A
B
2
C
4
3
E
Figure 7-9
Correspondence analysis process.
Table 7-2 Mean scores by cluster
Student
Mean English
score
Mean Math
score
Mean Music score
A
B, D, and F
C and E
60
98.7
62.5
90
83.3
75
78
87.7
42
Introduction to Data Mining
PC1 w11 X1 w12 X 2 w13 X 3 w1p X p
PC2 w21 X1 w22 X 2 w23 X 3 w2 p X p
PCm wm1 X1 wm 2 X 2 wm 3 X 3 wmp X p
(7.22)
where PCi is the ith principal component, wij is the coefficient of the jth
original variable in the ith principal component, and Xj is the jth original
variable. It is required that the sum of the squares of the coefficients in
each principal component is one (Brooks 2002). For the ith principal component, this translates into the constraint
wi12 wi 2 2 wi 3 2 wip 2 1
(7.23)
It is also required that the coefficient vectors of the principal components must be orthogonal, namely,
wi′w j 0
(7.24)
where wi [wi1 , wi 2 , … , wip ], w j [w j1 , w j 2 , … , w jp ], and i ≠ j.
If is the variance–covariance matrix of the original variables X and i
is the ith eigenvalue of , the following condition must be satisfied for a
nontrivial solution to exist.
det ( ∑ i I ) 0
(7.25)
The corresponding eigenvector of i is the factor loading vector wi.
Given that is a symmetric matrix, the resulting eigenvectors are orthogonal to one another. The length of the eigenvectors can be scaled to unit
length, as given by Eq. 7.23.
The fraction of the total variance in the original variables that is
explained by the ith principal component is given by
∑
i
ip
i=1 i
(7.26)
Factor analysis
Factor analysis is also a data reduction technique that uncovers the underlying factors, fewer in number than the original variables that are common
to the original variables. The original variables are linear combinations
of these common factors. Notice the contrast with principal component
165
166
Data Mining and Market Intelligence
analysis, where the principal components are linear combinations of the
original variables.
With wij (also called factor loading) the coefficient of the jth common factor
in the ith original variable, fj the jth common factor, and error terms i, we can
express the original variables, Xi, in terms of common factors as follows.
X1 w11 f1 w12 f 2 w13 f 3 w1k f k 1
X 2 w21 f1 w22 f 2 w23 f 3 w2 k f k 2
X m wm1 f1 wm 2 f 2 wm 3 f 3 wmk f k m
(7.27)
In matrix notation, Eq. 7.27 can be expressed as
X wf (7.28)
With X a m 1 matrix that contains the original variables, w a m k
matrix that contains all the coefficients of k common factors in the m original variables, f a k 1 matrix that contains the k common factors, and
an m 1 matrix with error terms. It is assumed that the error terms
are uncorrelated with each other and with the common factors, namely
E( ) 0 and E(f ) 0
With the covariance matrix of the common factors, and the covariance matrix of the error terms, the variance of the original variables is
given by the expression.
var(X ) wff ′w ′ var() wΘw ′ (7.29)
Assuming that the common factors have a variance of one and that
they are not correlated with each other, then I (identity matrix) and
Eq. 7.29 becomes (Dillon and Goldstein 1984)
m
k
var(X ) ww ′ ∑ ∑ w ij 2 (7.30)
i1 j1
The fraction of the variance of the original variables explained by the
common factors is given by
im
jk
∑ i1 ∑ j1 wij 2
(7.31)
var(X )
Discriminant analysis
Discriminant analysis is a technique that examines the differences between
two or more groups of members with respect to multiple members attributes
Introduction to Data Mining
(Klecka 1980). The dependent variable is a variable that indicates the group
a member belongs to. An example of a dependent variable is a variable that
has three possible values: ‘high-achiever group’, ‘medium-achiever group’,
and ‘low-achiever group’. The independent variables are attributes associated with the members. Discriminant analysis can be used to predict which
group a particular member with given attributes belongs to.
To study the characteristics of each group using discriminant analysis,
we start out by creating discriminant functions.
Discriminant functions are linear combinations of the independent variables (member attributes), defined as follows.
D1 b11 X1 b12 X 2 b13 X 3 b1p X p
D2 b21 X1 b22 X 2 b23 X 3 b2 p X p
Dm bm1 X1 bm 2 X 2 bm 3 X 3 bmp X p
(7.32)
where Di is the ith discriminant function, bip is the discriminant coefficient
of the pth indepentent variable in the ith discriminant function, and Xp is
the pth independent variable.
A discriminant function is created to maximize the ratio of its betweengroup variance and the within-group variance. The value of a discriminant
function is referred to as a discriminant score. Only those independent
variables that are predictive of the dependent variable are included in
the discriminant functions. Such variables are called discriminating variables. Statistics, such as Wilks’ lambda, the chi-square, and the F statistic
are used to determine which independent variables are predictive (Klecka
1990). Wilks’ lambda is also used to assess the statistical significance of
discriminant functions. The following is a list of key assumptions of discriminant analysis.
●
●
●
●
Assumption 1: The dependent variable (group identity variable) is
nonmetric.
Assumption 2: The discriminating variables have a multivariate normal distribution.
Assumption 3: The variance–covariance matrix of the discriminating
variables is the same across groups.
Assumption 4: Each member can belong to one and only one group.
There are similarities between linear regression, CHAID, and discriminant analysis in that all of these three techniques are used to explore the
dependence between a set of dependent and independent variables.
However, there are basic differences between these three techniques in
terms of assumptions (Dillon and Goldstein 1984). In a linear regression,
the dependent variable is assumed to be a normally distributed random
167
168
Data Mining and Market Intelligence
variable and the independent variables are assumed to be fixed. In discriminant analysis, it is assumed that the dependent variable is fixed and the
discriminating variables are normally distributed. In CHAID, there are no
distributional assumptions about the dependent or independent variables.
A further difference between CHAID and discriminant analysis is that discriminant analysis constructs discriminant functions as linear combinations of
the discriminating variables (independent variables), while CHAID does not
assume any such linear relationship. What CHAID and discriminant analysis
have in common is that both minimize misclassification by maximizing the
ratio of the variance between groups and the variance within groups.
Correspondence analysis
Correspondence analysis, also called dual scaling, is used to analyze
the association between two or more categorical variables and to visually represent this association in a low dimensionality diagram, called a
perceptual map. This method is particularly useful for analyzing large
contingency tables. In correspondence analysis it is assumed that the variables under analysis need to be categorical variables.
In his book, Applied Correspondent Analysis, Clausen proposes a stepby-step process for conducting a correspondence analysis, illustrated in
Figure 7-9.
The steps in the process of a correspondence analysis with two categorical variables are as follows.
●
●
●
Step one is the creation of a frequency table with the two categorical
variables: Assume that the X variable has k possible values and the Y
variable has l possible values. In this k 1 frequency table, entry nij is
the number of members whose X variable value equals i and whose
Y variable value equals j. The number of members whose X variable
equals i is ni and whose Y variable equals j is nj. ni and nj are called the
row total and column total, respectively.
Step two is to set up a row profile table and a column profile table.
The frequency table can be transformed to a row profile table by
dividing each entry nij by the row total ni. The frequency table can be
transformed into a column profile table by dividing each entry nij by
the column total nj.
Step three is to generate two key underlying dimensions for variables
X and Y and to plot both variables on a two-dimensional map. The
dimensions are selected based on the proportion of the variance of the
original variables that these dimensions explain. The higher the proportion is, the more significant the dimension is. A detailed discussion
(Clausen 1998) on the mathematical derivation of the dimensions is
beyond the scope of this book.
Introduction to Data Mining
We next discuss a correspondence analysis example. Assume a firm
conducts an online survey to measure the satisfaction level of visitors to
the firm’s website. The total number of visitors is 4492. The visitors are
classified into three types based on their familiarity of the site: first time
visitors, frequent visitors, and infrequent visitors. The visitors are asked
which of the following three features of the website is the most important
to them: content, navigation, and presentation. Correspondence analysis
will be used to analyze the survey response data and SAS will be used as
the data-mining tool.
In step one of the analysis, a contingency table (frequency table) is created based on visitor types and visitors’ selection of the most important
website feature (content, navigation, or presentation) as illustrated in
Table 7-3. Next the row profiles (row percentages) and the column profiles (column percentages) are created. Tables 7-4 and 7-5 show the row
profile and the column profile respectively.
The column profiles show that the infrequent visitors are likely to rate
navigation as an important driving factor of their satisfaction of the site.
Frequent visitors tend to identify content as the driving factor of their satisfaction with the site.
Table 7-3 Contingency table of visitor type and importance of the three
website features
Visitor type
New visitor
Infrequent visitor
Frequent visitor
Total
Most important web site feature
Content
Navigation
Presentation
Total
230
799
2400
3429
100
450
100
650
100
113
200
413
430
1362
2700
4492
Table 7-4 Web visitors row profiles
Visitor type
New visitor
Infrequent visitor
Frequent visitor
Row profile
Content
Navigation
Presentation
Total
0.54
0.59
0.89
0.23
0.33
0.04
0.23
0.08
0.07
1
1
1
169
170
Data Mining and Market Intelligence
Table 7-5 Web visitors column profiles
Column
profile
Visitor type
Content
Navigation
Presentation
New visitor
Infrequent visitor
Frequent visitor
Total
0.07
0.23
0.70
1
0.15
0.70
0.15
1
0.25
0.27
0.48
1
In step two of the analysis, two key dimensions are created. Figure 7-10
shows the SAS output of the analysis. The first dimension explains 87.89%
of the variance of the data, while the second explains 12.11% of the variance of the data.
In the third step of the analysis a two-dimensional correspondence map
is created. Table 7-6 and Table 7-7 show the row and column coordinates
of the two new dimensions.
The first dimension shows a large weight on navigation, an indication
of high association between this dimension and the navigation feature.
Therefore, we may label this dimension as ‘site navigation’. The second
dimension has a large weight on site presentation, an indication of high
Inertia and chi-square decomposition
Dimension
Singular
value
Principal Chiinertia square
percent
Cumulative
percent 18 36 54 72 90
----+----+----+----+----+--Dimension 1
0.39754
0.15804 709.904 87.89
Dimension 2
0.14755
0.02177
Total
87.89 ****************
97.797 12.11 100.00 ***
0.17981 807.701 100.00
Degrees of freedom 4
Figure 7-10
Correspondence analysis key dimensions and their explanatory power.
Introduction to Data Mining
Table 7-6 Row coordinates of the two dimensions in
correspondence analysis
Visitor type
Dimension 1
Dimension 2
New visitor
Infrequent visitor
Frequent visitor
0.3900
0.5165
0.3277
0.4298
0.1152
0.0103
Table 7-7 Column coordinates of the two dimensions
in correspondence analysis
Web site feature
Dimension 1
Dimension 2
Content
Navigation
Presentation
0.1995
0.9256
0.2000
0.0356
0.1033
0.4577
association between this dimension and the site presentation feature.
Therefore, we may label this dimension as ‘site presentation’.
Figure 7-11 shows the SAS output of the correspondence analysis. The
corresponding code is shown in Figure 7-12.
Dimension 2 (12.11%)
0.50
Presentation
New visitors
0.25
0.00
0.25
Frequent visitors
Content
0.75 0.50 0.25
Navigation
infrequent visitors
0.00
0.25
Dimension 1 (87.89%)
Figure 7-11
Correspondence analysis map.
0.50
0.75
1.00
171
172
Data Mining and Market Intelligence
*--- Create Input Data---;
data sasuser. corres_ex;
length visitor $30;
input visitor $ content navigation preso;
cards;
‘New_Visitors’ 230 100 100
‘Infrequent_Visitors’ 799 450 113
‘Frequent_Visitors’ 2400 100 200
run;
*---Perform Simple Correspondence Analysis---;
proc corresp all data=sasuser.corres_ex outc=outcorres;
var content navigation preso;
id visitor;
run;
*---Plot the simple correspondence analysis results---;
%plotit(data=outcorres, datatype=corresp);
Figure 7-12
SAS code for generating correspondence map.
Analysis of variance
Analysis of Variance (ANOVA) is a statistical technique used to quantify
the dependence relationship between dependent and independent variables. The technique is often used in experimental design where we wish
to assess the impact of stimuli (treatments or independent variables) on
one or more dependent variables. Although widely used for assessing
experimental results in the pharmaceutical and social sciences, ANOVA
has recently gained traction in marketing, especially when applied to the
understanding of marketing stimuli on audience responses. To be consistent with the application of ANOVA to experimental design, we will use
the treatment and block concepts introduced in Chapter 6. Treatments
Introduction to Data Mining
are the independent (or predictive) variables in an experimental design,
calibrated to observe potential causality. Blocks are groups of similar units
of which roughly equal numbers of units are assigned to each treatment
group. Units are objects, individuals, or subjects that are either subjected,
or not subjected to a treatment in an experiment.
One-way ANOVA is used to analyze the treatment effects on subjects
in an experiment. Two-way ANOVA is used to analyze both the treatment and the block (sometimes referred as replication) effects on subjects
(Snedecor and Cochran 1989).
One-way ANOVA assumes that any changes in subject behavior or characteristics are the result from treatment. This relationship between the subject behavior (dependent variable) and the treatment effects is expressed as:
Xij j ij
(7.33)
where Xij is the value of the dependent variable for subject i under treatment j, is the mean of the dependent variable across all subjects, j is the
difference between and j, the mean of the dependent variable for the
subjects under treatment j, and ij is the error term of subject i under treatment j. The error term represents the portion of subject behavior change
that is due to random effect.
In a one-way ANOVA analysis, the F statistic is used to assess the statistical significance of the treatment effects. The F statistic is the ratio of
the mean treatment sum of squares (definition to follow) and the mean
error sum of squares (definition to follow) with (a 1, n a) degrees of
freedom given a treatments and n subjects. The mean treatment sum of
squares is defined as
a
1
n j ( j )2
∑
a 1 j1
(7.34)
where nj is the number of subjects under treatment j, and j is the mean
of the dependent variable of the subjects under treatment j.
The mean error sum of squares is
1
na
a in j
∑ ∑ (Xij μ j )2
j1 i1
(7.35)
Hence, the F statistic is
ja
∑ j1 n j ( j )2 /(a 1)
a
in
∑ j1 ∑ i1 (Xij j )2 /(n a)
j
(7.36)
173
174
Data Mining and Market Intelligence
If the F statistic for the treatment effects is greater than the critical F
value at a specified significance level, then there is a difference in the
mean of the dependent variable across the different treatments groups
and the treatment effects are statistically significant.
Two-way ANOVA assumes that changes in subject behavior or characteristic are due to the treatment effects and the block effects. This relationship between the subject behavior, the treatment effects, and the block
effects is expressed as
Xijk j k
ijk
(7.37)
where Xijk is the value of the dependent variable of subject i in block
k under treatment j, is the mean of the dependent variable across all
subjects, j is the difference between and j, the mean of the dependent
variable for the subjects under treatment j, k is the difference between and k, the mean of the dependent variable for the subjects in block k, and
ijk is the error term of subject i in block k under treatment j.
For a two-way ANOVA analysis, the F statistic is used to assess the
statistical significance of the treatment effects and the block effects. The
F statistic for assessing the treatment effects is the ratio of the mean
treatment sum of squares (Eq. 7.34) and the mean error sums of squares
with (a 1, ab a b 1) degrees of freedom given a treatments and b
blocks. The F statistic for assessing the statistical significance of the treatment effects is
ab a b 1
ja
∑ j1 n j ( j )2
1
a
Fa1, abab1 in ,ja ,kb
kb
ja
∑ i1,j j1, k1 (Xijk )2 ∑ k1 mk ( k )2 ∑ j1 ni ( j )2
(7.38)
where Xijk is the value of the dependent variable for subject i in block
k under treatment j, is the mean of the dependent variable across all
subjects, j is the mean of the dependent variable for the subjects under
treatment j, k is the mean of the dependent variable for the subjects
in block k, nj is the number of subjects under treatment j, and mk is the
number of subjects in block k.
If the F statistic for the treatment effects is greater than the critical F
value at a specified significance level, then there is a difference in the
mean of the dependent variable between treatments and the treatment
effects are statistically significant.
The F statistic for assessing the block effects is the ratio of mean
block sum of squares and the mean error sums of squares with
Introduction to Data Mining
(b 1, ab a b 1) degrees of freedom given a treatments and b
blocks. The mean block sum of squares is
b
1
mk ( k )2
∑
b 1 k1
(7.39)
The F statistic for assessing the statistical significance of the block
effects is the ratio of mean block sum of squares and mean error sums of
squares with (b 1, ab a b 1) degrees of freedom (Mead 1989).
ab a b 1 kb
∑ k1 mk ( j )2
1
b
Fb1, abab1 in , ja , kb
kb
ja
∑ i1,j j1, k1 (Xijk )2 ∑ k1 mk ( k )2 ∑ j1 n j ( j )2
(7.40)
If the F statistic for the block effects is greater than the critical F value at
a specified significance level, then there is a difference in the mean of the
dependent variable between blocks. In other words, the block effects are
statistically significant.
Canonical correlation analysis
Canonical correlation analysis is used to analyze correlation between
two groups of metric variables, where each set consists of one or more
variables. Simple and multiple linear regression are particular cases of
canonical analysis. Simple linear regression has one variable in both sets,
while multiple linear regression has one variable in one set of variables,
and multiple variables in the other set. Often, one set of variables is interpreted as dependent variables and the other as independent variables, as
in the case of linear regression (Dillon and Goldstein 1984).
Given a set of X variables and a set of Y variables, a canonical analysis
finds X*, a linear combination of the X variables, and Y*, a linear combination of the Y variables such that X* and Y* are highly correlated. X* and Y*
are called canonical variates. The analysis often results in multiple sets of
X* and Y*. The coefficients in the linear combinations are called canonical
weights. With this, X* and Y* can be written as:
X * a1 X1 a2 X 2 am X m
(7.41)
Y * b1Y1 b2 Y2 bp Yp
(7.42)
where a1, a2, …, am are the canonical weights for canonical variate X*, and
b1, b2, …, bp are the canonical coefficients for the canonical variate Y*.
175
176
Data Mining and Market Intelligence
The set of X* and Y* with the highest correlation among all the possible set of canonical variates is called the first set of canonical variates.
Canonical variates are normalized to have unit variance (Dillon and
Goldstein 1984).
We next discuss an example of canonical correlation analysis. Assume
company ABC conducts an online survey to measure how the demographics of its website visitors correlate with their satisfaction about the
firm’s website. The respondents are asked to rate their satisfaction in the
following five areas.
●
●
●
●
●
Satisfaction with ABC as a company
Overall satisfaction with the website
Satisfaction with the content of the website
Satisfaction with the ease of navigation of the website
Satisfaction with the presentation of the website.
The respondents are also asked to answer the following set of questions
about their own characteristics (web activities and their demographics).
●
●
●
●
Frequency of their visit to the website of company ABC
The number of personal computers in the respondent’s company
The number of employees in the respondent’s company
Their need for technology consulting services.
The canonical correlation analysis technique can be applied to determine
how the respondent satisfaction about the website correlates with the
respondent characteristics (demographics and web activities). Figure 7-13
shows the SAS output of the canonical correlation coefficients. The four
largest canonical correlation coefficients are 0.39, 0.19, 0.07, and 0.007. The
canonical analysis creates four satisfaction canonical variates and four
characteristics canonical variates. Figure 7-14 shows the eigenvalues associated with each of the four canonical coefficients. Table 7-8 shows how
the five original satisfaction variables are correlated with the four satisfaction canonical variates. Table 7-9 shows the Pearson product-moment
correlation coefficients between the four original respondent characteristic variables and the four characteristic canonical variates.
Multi-dimensional scaling analysis
The technique is used to construct a low-dimensional (two-dimensional,
for example) map that best describes the relative positions and proximity of
members in a multi-dimensional space. Multi-dimensional scaling (MDS)
can be applied to either metric or nonmetric data. MDS applied to metric
and nonmetric data are called metric MDS and nonmetric MDS, respectively. MDS is similar to factor analysis and principal component analysis
in that all of them are data reduction techniques. Here, we will focus on
metric MDS.
Introduction to Data Mining
The CANCORR Procedure
Canonical correlation analysis
Canonical
correlation
Adjusted
Canonical
correlation
Approximate
standard
error
Squared
correlation
1
0.391113
0.389369
0.012468
0.152969
2
0.190260
0.187646
0.014187
0.036199
3
0.071957
.
0.014644
0.005178
4
0.007380
.
0.014719
0.000054
Canonical
Figure 7-13
Top four canonical correlation coefficients.
Test of H0: The canonical correlations in the current row and all that follow are zero
Eigenvalues of Inv (E)*H
= CanRsq/(1-CanRsq)
Likelihood Approximate
Eigenvalve Difference Proportion Cumulative ratio F Value Num df Den df Pr > F
1
2
3
4
0.1806
0.0376
0.0052
0.0001
0.1430 0.8083 0.8083 0.81209783
0.0324 0.1681 0.9765 0.95875856
0.0052 0.0233 0.9998 0.99476798
0.0002 1.0000 0.99994554
49.48
16.30
4.03
0.13
20 15281 <.0001
12 12192 <.0001
6 9218 0.0005
2 4610 0.8820
Figure 7-14
Canonical correlation coefficients and their corresponding
eigenvalues.
Metric MDS, often referred as classical scaling, uses the distance
between members to measure their similarity in certain attributes. There
are three steps in performing a metric MDS analysis (Dillon and Goldstein
1984).
177
178
Data Mining and Market Intelligence
Table 7-8 Correlation between the original satisfaction variables and
their canonical variates
Satisfaction
Brand
satisfaction
Website
satisfaction
Content
satisfaction
Navigation
satisfaction
Presentation
satisfaction
Canonical
variate 1
Canonical
variate 2
Canonical
variate 3
Canonical
variate 4
0.8785
0.2004
0.2322
0.2457
0.8686
0.0247
0.0040
0.4814
0.2835
0.8539
0.1675
0.3560
0.7684
0.1172
0.1245
0.0672
0.6775
0.2655
0.6796
0.0590
Table 7-9 Correlation between the original company characteristic vari-
ables and their canonical variates
Characteristics
Consulting need
Visit frequency
Company size
Number of PCs
Canonical
variate 1
Canonical
variate 2
Canonical
variate 3
Canonical
variate 4
0.8568
0.4301
0.4031
0.4025
0.0949
0.8617
0.4835
0.4806
0.5022
0.2637
0.7058
0.7448
0.0686
0.0546
0.3251
0.2285
The first step of a metric MDS analysis is to create a distance matrix D
for a group of members with p attributes. The entries in D, denoted by
dij, are the Euclidean distances between the ith and jth members. If Xik is
the value of the kth attribute of member i, and Xjk is the value of the kth
attribute of member j, the Euclidian distances are given by
⎪⎧ kp
⎪⎫
dij ⎪⎨ ∑ (Xik X jk )2 ⎪⎬
(7.43)
⎪⎪ k1
⎪⎪
⎩
⎭
Matrix D is symmetric with zero diagonal elements and non-negative
off-diagonal elements.
Introduction to Data Mining
The second step in metric MDS analysis is to transform the distance matrix
D into another symmetric matrix B, whose entries are defined as follows
1
bij (dij 2 di.2 d. j 2 d..2 )
2
where di.2 1
n
1
(7.44)
1
∑ j dij 2 , d. j 2 n ∑ i dij 2 , and d..2 n2 ∑ j ∑ i dij 2
The third step in the analysis is to compute the two largest eigenvalues of
matrix B and their associated eigenvectors. The elements of the eigenvectors
are the coordinates of the members in the new lower-dimensional space.
Time series analysis
Time series analysis is used to analyze data collected at uniformly distributed time intervals. An example of time series data is the trade volume of
a public traded company at the beginning of the month (Figure 7-15).
The objective of time series analysis is twofold:
●
To describe the pattern of time series data as a function of time and
other variables. A pattern is a stable function that describes the general trend of a time series.
10,000
Trade volume (in thousands)
9000
8000
7000
6000
5000
4000
3000
2000
1000
Date
Figure 7-15
Trade volume time series.
10/7/2005
7/7/2005
4/7/2005
1/7/2005
7/7/2004
10/7/2004
4/7/2004
1/7/2004
10/7/2003
7/7/2003
4/7/2003
1/7/2003
7/7/2002
10/7/2002
4/7/2002
1/7/2002
0
179
180
Data Mining and Market Intelligence
●
To forecast the occurrence of future events based on historical data.
Forecasting must be understood in terms of probabilities of occurrence
given the historical information represented by the data.
Autocorrelation is an important concept in time series analysis. The
autocorrelation coefficient is the correlation coefficient of a time series
variable at time t, and the same variable at time t k (a time lag of k), and
can be expressed as rk (Miller and Wichern 1977).
t = N −k
rk =
(1 / N )∑ t=1
(1 / N )∑
( yt − y )( yt+k − y )
t= N
t=1
( yt −
y )2
t = N −k
=
∑ t=1
∑
( yt − y )( yt+k
t= N
( y t − y )2
t=1
− y)
(7.45)
Time series analysis is usually conducted under conditions of stationarity. In a stationary time series, the variable under analysis has constant
mean, variance, and autocorrelation. To facilitate the type of time series
analysis we discuss here, nonstationary time series data needs to be transformed until it becomes stationary. The most common transformation
is to difference the series by replacing of the value of a variable at time
t with the difference between the value at time t and the value at time
t 1. Sometimes, achieving stationarity may require multiple differencing iterations.
Here we focus on autoregressive (AR) and moving average (MA) time
series analysis models. In an autoregressive (AR) model, variable Y at
time t is a function of variable Y at time t k,
Yt k Ytk t
k 1, 2,…
(7.46)
where is the intercept, k is an autoregressive parameter, and t is the
error term at time t.
Autoregressive models are based on five assumptions.
●
●
●
●
●
Assumption 1: Expected value of the error term E(t) 0
Assumption 2: Constant variance of the error term var(t) 2
Assumption 3: Covariance between the error terms E(tj) 0 where
ij
Assumption 4: Covariance between the error term and time series
value at time t k, E(tytk) 0
Assumption 5: 1 k 1
A first-order autoregressive model is a model where the value of variable Y at time t, Yt, is a linear function of value of variable Y at time t 1,
Yt1. A first-order autoregressive model can be expressed as follows
yt 1 yt1 t
(7.47)
Introduction to Data Mining
where is an intercept, 1 is the first-order autoregressive parameter, and
t is the error term at time t.
A second-order autoregressive model is a model where Yt, is a linear
function of Yt1 and Yt2.
Yt 1Yt1 2 Yt2 t
(7.48)
where is an intercept, 1 is the first-order autoregressive parameter, 2
is the second-order autoregressive parameter, and t is the error term at
time t.
We can extend the autoregressive model to kth-order model
ik
Yt ∑ i Yti t
i1
(7.49)
where is an intercept, i is the ith-order autoregressive parameter, and t
is the error term at time t.
In a MA model, variable Y at time t is a function of the error terms at
various points in time.
y t et k etk
k 1, 2,…
(7.50)
with a constant, ei an error term at time t, et–k the error term at time t k
and k a MA parameter.
MA models are based on four assumptions.
●
●
●
●
Assumption 1: The expected value of the error term E(et) 0
Assumption 2: Constant variance of the error term var(et) 2
Assumption 3: Covariance between the error terms E(eiej) 0, where
i j.
Assumption 4: 1 k 1
A first-order MA model is a model where the value of variable Y at
time t, Yt, is a function of the error term at time t, et, and the error term at
time t 1, et1.
y t et 1 et1
(7.51)
where is a constant and 1 is a first-order MA parameter.
In a second-order MA Yt, is a linear function of the error terms et, et1,
and et2.
Yt et 1 et1
2 et2
(7.52)
where is a constant, 1 is a first-order MA parameter, and 2 is a
second-order MA parameter.
The autocorrelation function (ACF) and partial autocorrelation function (PACF) can be used to determine the appropriate type of model (AR
181
182
Data Mining and Market Intelligence
or MA) to use. An ACF plot, referred to as a correlogram, is a chart that
displays the autocorrelation coefficient (defined in Eq. 7.45), along with
its confidence interval, against the number of time unit lags. The mutual
influence of elements in a time series that are separated by more than one
lag is affected by the correlation of intermediate lags. The PACF expresses
the correlation coefficient between a variable and a particular lag when
the correlation effects of intermediate lags are removed. To compute the
PACF of a particular lag, the linear influence of intermediate lags must
be removed. In the particular case of unit lag, partial autocorrelation and
autocorrelation are equivalent. The following are some rules for using
ACF and PACF plots as suggested by the SPSS software.
●
●
●
If a time series is not stationary, its ACF plot will likely show that the
autocorrelation coefficient remains high for half a dozen or more lags,
rather than quickly declining to zero.
If a time series has an autoregressive property, its ACF plot will be
characterized by an exponentially declining autocorrelation. Its PACF
plot will show spikes in the first few lags. The number of distinct
spikes indicates the order of an autoregressive model.
If a time series has a MA property, its correlogram will show spikes
in the first few lags. Its PACF plot will be characterized by an exponentially declining pattern. The number of distinct spikes in the correlogram gives an indication of the order of the MA model needed to
satisfactorily describe the time series.
As an example of the application of ACF and PACF to time series analysis, consider the data in Figure 7-15, describing monthly stock trade volume of a financial company from January 2002 to December 2005. The ACF
plot of this time series is illustrated in Figure 7-16. In this plot, the autocorrelation coefficient remains significant for half a dozen or more lags,
rather than quickly declining to zero. The pattern indicates that the data
is not stationary and will need to be differenced. The result of the first
differencing is shown in Table 7-10 as the value of variable ‘Volume-1’ on
February 1, 2002 is 619,340 is computed by subtracting 5,287,016, the
trade volume on January 7, 2002, from 4,667,673, and the original trade
volume on February 1, 2002.
Using the differenced data, we produce new ACF and PACF plots,
shown in Figures 7-17 and 7-18 to examine any potential autoregressive
or MA properties in the data.
The ACF plot in Figure 7-17 shows an overall decline in ACF, but there
are still significant values for lag values of eight, twelve, and thirteen. The
PACF plot in Figure 7-18 shows spikes for lag values of one, two, three,
five, seven, and nine. We next apply one additional degree of differencing
Introduction to Data Mining
Volume
1.0
ACF
0.5
0.0
0.5
1.0
1
2
3
Coefficient
4
5
6
7 8 9 10 11 12 13 14 15 16
Lag number
Upper confidence limit
Lower confidence limit
Figure 7-16
Stock trade volume time series ACF (SPSS output).
to remove the spikes at lags five, seven, and nine. The resulting ACF and
PACF are shown in Figure 7-19 and Figure 7-20, respectively.
After two degrees of differencing, the spikes at lags one, two, and three
of the PACF chart remain but those at lag five, seven, and nine are significantly diminished. At this stage, we can attempt to fit an autoregressive
model with the data at two degrees of differencing.
Some time series models possess both autoregressive and moving average properties after the underlying data has been differenced appropriately to achieve stationarity. These models are referred as autoregressive
integrated moving average models, ARIMA (p, d, q). Parameters p, d, and
q indicate the order of autocorrelation, the number differencing passes,
and the moving average order, respectively.
Standard ARIMA time series analysis is a three-step process: model
specification, parameter estimation, and diagnostic checking (Miller and
Wichern 1977).
183
Data Mining and Market Intelligence
Table 7-10 Differencing the data – the first ten data points of the time
series
Date
Close
price
Trade volume Volume_1
Volume_2
Volume_3
7-Jan-02
1-Feb-02
1-Mar-02
1-Apr-02
1-May-02
3-Jun-02
1-Jul-02
1-Aug-02
3-Sep-02
1-Oct-02
14.37
13.04
13.09
11.39
12.09
11.20
8.95
9.18
8.70
9.18
5,287,016.00
4,667,673.00
4,346,905.00
4,882,322.00
5,022,504.00
5,086,200.00
5,840,595.00
4,582,559.00
5,014,290.00
4,588,421.00
298,575.00
856,185.00
395,235.00
76,486.00
690,699.00
2,012,431.00
1,689,767.00
857,600.00
557,610.00
1,251,420.00
318,749.00
767,185.00
2,703,130.00
3,702,198.00
2,547,367.00
619,343.00
320,768.00
535,417.00
140,182.00
63,696.00
754,395.00
1,258,036.00
431,731.00
425,869.00
DIFF (Volume_1)
1.0
0.5
ACF
184
0.0
0.5
1.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Lag number
Coefficient
Upper confidence limit
Figure 7-17
ACF with difference of one (SPSS output).
Lower confidence limit
Introduction to Data Mining
DIFF (Volume_1)
1.0
Partial ACF
0.5
0.0
0.5
1.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Lag number
Coefficient
Upper confidence limit
Lower confidence limit
Figure 7-18
PACF with difference of one (SPSS output).
●
●
●
Model specification: The first step in building an ARIMA model is
to identify the appropriate values for model parameters p, d, and q.
Plots on raw data, such as ACF and PACF can be used to uncover any
underlying data pattern. Take as an example the time series data on
stock trade volume in Figure 7-15. The time series becomes stationary and exhibits an autoregressive pattern of up to three lags after two
degrees of differencing.
Parameter estimation: During the parameter estimation phase, a function minimization algorithm is used to maximize the likelihood of the
observed time series, given the parameter values. In regression, this
requires minimization of the sum of squares of the errors.
Diagnostic checking: In the diagnostic checking phase, the data is
examined to uncover any violation of the key assumptions. The outcome of diagnostic checking may lead to modification of the model by
changing the number of parameters.
185
Data Mining and Market Intelligence
DIFF (Volume_2)
1.0
0.5
ACF
186
0.0
0.5
1.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Lag number
Coefficient
Upper confidence limit
Lower confidence limit
Figure 7-19
ACF with difference of two (SPSS output).
Conjoint analysis
Conjoint analysis is used to quantify the relationship between product
utility and a product feature, such as price, warranty, color, and quality
based on feedback from prospective customers. The level of product utility can be quantified by product ranking. Data used for conjoint analysis is usually acquired from survey studies where potential customers
are asked to rate real or fictitious products with various combinations of
features. The application of conjoint analysis enables product vendors to
create the best combination of features to maximize the appeal of their
products to potential buyers.
Conjoint analysis is based on a combination of data transformation and
linear regression. The dependent variable in a conjoint analysis is product ranking, which can be treated as either metric or nonmetric variable.
Depending on how the dependent variable is treated, the analysis can
be classified as metric conjoint analysis, where the dependent variable is
treated as metric, and nonmetric conjoint analysis, where the dependent
Introduction to Data Mining
DIFF (Volume_2)
1.0
Partial ACF
0.5
0.0
0.5
1.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Lag number
Coefficient
Upper confidence limit
Lower confidence limit
Figure 7-20
PACF with difference of two (SPSS output).
variable is regarded as nonmetric (Kuhfeld 2005). Here, we will focus on
metric conjoint analysis.
Metric conjoint analysis evaluates the relationship between feature utility (also known as part-worth utility) and product ranking. To illustrate
this concept, consider the case of a product with two features, color and
size. Assume that a car can be red, white, or blue. Here, color is one feature of interest, and red, white, and blue are this feature’s values. Assume
size is another feature of interest, with small and large as its values.
Assume that in a survey, two potential customers are shown cars with six
combinations of colors and sizes and are asked to rate each car on a scale
of 1 to 10. Table 7-11 shows the data collected from the survey.
To conduct conjoint analysis, using the survey information exemplified
in Table 7-11, we first derive a relationship between product rating and
product features of the form
yi , j colori size j i , j
(7.53)
187
188
Data Mining and Market Intelligence
Table 7-11 Data illustration for conjoint analysis (in practice, a large
number of rating columns must be used to obtain statistically significant testimations)
A large red car
A large blue car
A large white car
A small red car
A small blue car
A small white car
Rating given by potential
customer 1
Rating given by potential
customer 2
5
6
3
8
2
9
8
9
3
5
7
6
where yi,j is the rating when the color feature has a value i and the size
feature has a value j. The value is the mean rating across colors and
sizes. The parameters colori and sizej, which are computed using linear
regressions, represent the deviations of ycolor, size from due to the color
and size features (i stands for either red, blue, or white, and j for large or
small), respectively.
To estimate the values of these two parameters, we utilize linear regression after transforming the original product feature variables, which are
categorical variables, into metric dummy variables, Xred, Xblue, Xwhite,
Xlarge, and Xsmall.
ycolor,size colorred Xred colorblue X blue
colorwhite X white sizesmall Xsmall
+ sizelarge Xlarge + color,size
(7.54)
ANOVA is used to determine whether colori and sizej are statistically
significant in impacting the product rating.
Once colori and sizej are estimated and their statistical significance
established, a decision can be made about which of the product features
considered in the analysis must be incorporated into the product to maximize its appeal to customers.
Logistic regression
Logistic regression is used to estimate the probability of the occurrence of
an event. In direct marketing, logistic regression is one of the most widely
Introduction to Data Mining
used techniques for building targeting models, such as response and conversion models. Figure 7-21 shows the relationship between the probability of the event occurring and value of the independent variable.
The odd ratio of event Y, PY / PN , where PY and PN add up to one,
is defined as the ratio of the probability of occurrence of Y (PY) and the
1.00
Probability of event occurring
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
0
5
10
15
20
25
30
Independent variable
Figure 7-21
Probability of event occurring vs. value of independent variable.
probability of no occurrence of Y (PN). In logistic regression, the natural
logarithm of the odd ratio is assumed to be a linear function of the independent variables that are predictive of the outcome of event Y. A logistic
regression model can be expressed as (Gujarati, 1988):
⎛P
ln ⎜⎜⎜ Y
⎜⎝ PN
⎞⎟
⎟⎟ 0 1 X1 2 X 2 k X k
⎟⎠
(7.55)
where Xi’s are the independent variables and i’s are coefficients, whose
statistical significance can be assessed with a chi-square statistic. To deal
with nonmetric independent variables, we introduce dummy variables,
which have a numeric value of zero or one depending on whether a particular state obtains or not. Take as an example, an income variable with
three possible values, ‘high-income’, ‘medium-income’, and ‘low-income’.
The variable needs to be transformed to three new binary variables named
189
190
Data Mining and Market Intelligence
‘high-income’, ‘medium-income’, and ‘low-income’. The value of each of
these three variables for a member is one if the member is in the group
and is zero otherwise.
Based on Eq. 7.54, we can derive the probability of occurrence of event Y
PY e 0 1X1 k Xk
1 e 0 1X1 k Xk
k 1, 2, 3, …
(7.56)
Association analysis
Association analysis addresses the question: What products do customers
tend to purchase together? This tendency is expressed in terms of rules
that indicate the likelihood of two or more products being purchased by
the same person over a period of time. Symbolically, a rule indicating that
a person who purchases product A tends to purchase product B is commonly expressed as A;B.
Three standard measures are used to quantify the significance of such
a rule: support, confidence, and lift. These measures are typically supported by data mining software, such as SAS Enterprise Miner, IBM
Intelligent Miner, and Associate Engine.
Confidence is the conditional probability that a customer will purchase
product B given that he has purchased product A from a particular vendor. Lift measures how many times more likely is a customer who has
purchased product A to purchase product B, when compared with those
who are randomly selected from the sample. Support is the percentage of
total transactions where A and B are purchased together from the vendor
during a specified period of time.
Combinations of products with high values for confidence, lift, and support are highly associated. There is no set criterion for a cut off threshold
for any of these measures below which the level of association can be considered negligible. Once a vendor has identified a combination of products that are highly associated, the products can be marketed as a bundle
for cross-sell purposes.
Collaborative filtering
Collaborative filtering aims to forecast individual preferences about
product or services. To achieve this, collaborative filtering relies on the
preferences of a group of people that are “similar” (in a sense we will
describe shortly) to the individuals whose preferences are being studied.
Collaborative filtering is a technique often applied in marketing for crosssell and up-sell analysis.
To quantify similarity, we compute correlation between individuals
based on their attributes, such as their ratings or purchases of particular
Introduction to Data Mining
products. Those who have the highest correlation with a particular individual are referred to as this individual’s nearest neighbors. Consider
the following example: Five individuals (Susan, David, Anna, Linda,
and Tom) are asked to rate six movies (‘Traffic’, ‘Cast Away’, ‘Chocolate’,
‘Mission Impossible’, ‘The Gift’, and ‘Quill’). Susan does not rate two of
the six movies, ‘The Gift’ and ‘Quill’. A collaborative filtering analysis is
conducted to predict which of these two movies is more appealing to Susan
so that an appropriate movie recommendation can be offered to her.
There are two questions that we need to address in this analysis.
●
●
Who are Susan’s nearest neighbors?
Which movie (‘The Gift’ or ‘Quills’) should we recommend to Susan?
To identify Susan’s nearest neighbor, we first calculate the Pearson correlation coefficient between individuals (as we discussed in Chapter 6)
based on the data in Table 7-12.
Table 7-12 Movie ratings for collaborative filtering
Individual
name
Susan
David
Anna
Linda
Tom
Movie rating
Traffic
Cast
Away
Chocolate
Mission
Impossible
The Gift
Quills
5
4
3
1
2
1
2
1
2
3
4
4
4
5
4
3
3
3
1
2
N/A
1
2
2
2
N/A
2
5
4
3
The correlations turn out as follows
Susan and David: 0.97
Susan and Anna: 0.81
Susan and Linda: 0.08
Susan and Tom: 0.15
David and Anna are Susan’s nearest neighbors since their correlation
coefficients with Susan have the highest values.
We next compute these who individuals’ average rating of the two
movies that Susan has not rated (‘The Gift’ and ‘Quills’) The average of
David’s and Anna’s ratings for ‘The Gift’ is 1.5, and for ‘Quills’ is 3.5.
Therefore, ‘Quills’ should be recommended to Susan since its average rating from David and Anna is higher than that of ‘The Gift’.
191
192
Data Mining and Market Intelligence
Collaborative filtering is frequently used in marketing for product recommendations that lead to potential cross sell and up sell opportunities.
Amazon.com is an example of the use of this technique.
■ References
Berry, M.J.A., and G. Linoff. Data Mining Techniques. John Wiley & Sons, New
York, 1997.
Box, G.E.P., G.M. Jenkins, and G.C. Reinsel. Time Series Analysis: Forecasting and
Control, 3rd ed. Prentice Hall, Upper Saddle River, New Jersey, 1994.
Breiman, L., J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and
Regression Tree. Chapman & Hall/CRC, Boca Raton, FL, 1998.
Brooks, C. Introductory Econometrics for Finance. Cambridge University Press, New
York, 2002.
S-E. Clausen. Applied Correspondence Analysis: An Introduction (Quantitative
Applications in the Social Sciences). Sage Publications, Thousand Oaks, CA, 1998.
Cooley, W.C., and P.R. Lohnes. Multivariate Data Analysis. John Wiley & Sons, New
York, 1971.
Dillon, W., and M. Goldstein. Multivariate Analysis – Methods and Applications. John
Wiley & Sons, New York, 1984.
Everitt, B.S., and S. Rabe-Hesketh. The Analysis of Proximity Data. Arnold, London,
1997.
Greenacre, M.J. Practical Correspondence Analysis. In V. Barnett (ed.) Interpreting
Multivariate Data (pp. 119–146). John Wiley & Sons, New York, 1981.
Klecka, W.R. Discriminant Analysis (Quantitative Applications in the Social Sciences).
Sage Publications, Thousand Oaks, CA, 1980.
Kuhfeld, W.F. (2005). Conjoint Analysis SAS Technical Support Series TS-722H. SAS
Institute, Cary, NC, 2005.
Mead, R. The Design of Experiments – Statistical Principles for Practical Application.
Cambridge University Press, New York, 1988.
Miller, R., and D.W. Wichern. Intermediate Business Statistics: Analysis of Variance,
Regression, and Time Series. The Dryden Press, HRW (Holt, Rinehart, and Winston,
Inc.), Austin, TX, 1977.
Neter, J., W. Wasserman, and M.H. Kutner. Applied Linear Statistical Models. Irwin,
Homewood, IL, 1990.
Rud, O.P. Data Mining Cookbook. John Wiley & Sons, New York, 2001.
Snedecor, G.W., and W.G. Cochran. Statistical Methods, 8th ed. Iowa State
University, Ames, Iowa, 1989.
Struhl, S.M. Market Segmentation, An Introduction and Review. American Marketing
Association, Chicago, IL, 1992.
CHAPTER 8
Audience
Segmentation
This page intentionally left blank
Audience segmentation is one of the most important marketing analytic
topics and an enabler in matching the right products, marketing message,
incentives, and creative with the right audiences. This chapter presents
four case studies that demonstrate the application of data mining techniques to audience segmentation analysis.
Effective segmentation is objective-driven. Segmentation analysis can
be performed based on one or more of the following objectives.
●
●
●
●
●
Understanding the audience behaviors
Understanding the audience needs
Understanding the audience values
Understanding the audience product ownership
Understanding the audience satisfaction or pain points.
The four data mining techniques presented in the chapter are cluster
analysis, CART, CHAID, and discriminant analysis. Cluster analysis is an
unsupervised technique that examines the interdependence of a group of
independent variables with no regard for a dependent variable. This technique is used when there is more than a single segmentation objective, such
as understanding both the demographics and the behaviors of an audience. In this case, both objectives are important and there is no single designated dependent variable. In contrast, discriminant analysis and decision
tree approaches such CHAID and CART, are supervised techniques where
there is a predetermined dependent variable. These techniques are used
when there is only one objective. For example, a segmentation study aiming at differentiating the values of different customers is better addressed
by a decision tree analysis where customer value is the dependent variable.
■ Case study one: behavior and
demographics segmentation
Travel services firm Travel Wind conducts a survey to collect customer
demographic and behavioral data. Travel Wind is offering basic travel
services such as flight tickets and hotel bookings. The company is considering selling more upscale vacation packages to both domestic and
foreign travelers. The survey intends to gauge the potential of such a premier offering.
Travel Wind conducts a study to segment the customers based on their
proclivity to domestic or foreign travel, their Internet and cell phone
usage, and their demographic profiles. Travel Wind wants to use the segmentation results to drive effective creative and marketing messaging, as
well as to select appropriate marketing channels for launching the new
vacation package offering.
196
Data Mining and Market Intelligence
The Travel Wind survey collects eight attributes from 4902 customers. Out of the 4902 customers, 4897 provide information on all the eight
attributes. All attributes except age have a nonmetric data type. The eight
attributes and their data types are:
●
●
●
●
●
●
●
●
Age (metric)
Household income (ordinal)
Child presence (binary)
Education (categorical)
Internet usage (ordinal)
Cellular phone usage (ordinal)
Interest in US domestic travel (binary)
Interest in foreign travel (binary).
Travel Wind applies hierarchical agglomerative clustering to segment
the 4897 members who answer all the survey questions. The technique
initially treats every member as a cluster and then groups members based
on inter-member distances.
The data is divided into two sets, one for modeling building (training)
and the other for validation. Around 70% of the data is randomly selected
for model building and the remaining data is used for validation.
Travel Wind uses the two-step cluster module of SPSS for the analysis. The SPSS proprietary module enables Travel Wind to analyze data of
mixed data types (metric and nonmetric). The first step of the SPSS twostep analysis is a pre-cluster process that groups members into small subclusters. The second step of the SPSS two-step cluster analysis combines
these small sub-clusters into larger clusters. The two-step cluster module
uses the Akaike’s Information Criterion (AIC) statistic (Akaike 1974) to
determine robustness of the model and the optimal number of clusters
within the model. Lower AIC values indicate a more desirable configuration of model parameters. When a decrease in AIC becomes negligent, the
clustering application stops and the analysis is completed.
Model building
Table 8-1 shows the SPSS output based on the model building data set.
The decrease in AIC value becomes negligible after the first two clusters
are created. Out of 3410 members used for model building, five of them
cannot be classified into either cluster. The net number of members in the
two clusters is 3405.
As illustrated in Table 8-2, over half of the members are classified into
cluster two. Cluster one consists of 32% of the sample while cluster two
consists of 68% of the sample.
Table 8-3 shows that the average age of household head in cluster one
(68 years old) is higher than that in cluster two (51 years old). The overall
average age of the members is 56.
Audience Segmentation
Table 8-1 AIC statistic for determining optimal number of clusters in
case study one (model building stage)
Number
of clusters
Akaike’s
Information
Criterion (AIC)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
58129.304
52521.868
50432.398
48319.882
47094.284
45883.723
44838.798
44048.546
43260.541
42544.896
41886.134
41262.019
40658.209
40131.944
39657.293
AIC change (a)
Ratio
of AIC
changes (b)
Ratio of
distance
measures (c)
5607.436
2089.470
2112.515
1225.598
1210.562
1044.925
790.251
788.006
715.645
658.761
624.115
603.809
526.266
474.650
1.000
0.373
0.377
0.219
0.216
0.186
0.141
0.141
0.128
0.117
0.111
0.108
0.094
0.085
2.629
0.989
1.685
1.012
1.149
1.296
1.003
1.092
1.078
1.050
1.030
1.130
1.095
1.038
(a) The changes are from the previous number of clusters in the table. (b) The
ratios of changes are relative to the change for the two-cluster solution. (c) The
ratios of distance measures are based on the current number of clusters against
the previous number of clusters. (d) Approximately 70% of cases (SAMPLE) Selected.
Table 8-2 Cluster breakdown
Cluster 1
Cluster 2
N
Fraction of combined
1095
2310
32.3%
67.8%
The statistics in Table 8-4 indicate that the members in cluster two are
better educated than those in cluster one. In cluster two, 44% of the members have a graduate school degree compared to 24% in cluster one.
Table 8-5 shows that the percentage of members with children in the
household is much higher in cluster two (44%) than in cluster one (11%).
197
198
Data Mining and Market Intelligence
Table 8-3 Mean and standard deviation of the age of household head
by cluster
Mean of age
Standard deviation of age
67.75
50.89
56.31
13.967
9.652
13.709
Cluster 1
Cluster 2
Combined
Table 8-4 Education level distribution by cluster
High school
College
Graduate
school
Vocational
school
Freq. Cluster Freq. Cluster Freq. Cluster Freq. Cluster
percent
percent
percent
percent
Cluster 1
Cluster 2
Combined
402
366
768
36.7
15.8
22.6
423
914
1337
38.6
39.6
39.3
262
1011
1273
23.9
43.8
37.4
8
19
27
0.8
0.8
0.7
Table 8-5 Presence of children in household by cluster
Unknown
Cluster 1
Cluster 2
Combined
Without
children
With at least one
child
Freq.
Cluster
percent
Freq.
Cluster
percent
Freq.
Cluster
percent
138
85
223
12.6
3.7
6.6
833
1217
2050
76.1
52.7
60.2
124
1008
1132
11.3
43.6
33.3
Table 8-6 shows that members in cluster two are wealthier than those in
cluster one. Of the cluster two members, 71% have an annual household
income of over $100,000 while only 19% of the cluster one members have
the same household income.
Audience Segmentation
Table 8-6 Household income distribution by cluster
Less than $50 K
Cluster 1
Cluster 2
Combined
$50 K–$99,999
$100 K or higher
Freq.
Cluster
percent
Freq.
Cluster
percent
Freq.
Cluster
percent
301
9
310
27.5
0.4
9.1
586
652
1238
53.5
28.2
36.4
208
1649
1857
19.0
71.4
54.5
With regard to travel proclivity, Table 8-7 shows that cluster one has
a higher percentage of frequent foreign travelers (63%) than cluster two
(52%). There is no significant difference in domestic travel proclivity
between cluster one (79%) and cluster two (74%), as shown in Table 8-8.
Table 8-7 Foreign travel proclivity by cluster
Infrequent traveler
Cluster 1
Cluster 2
Combined
Frequent traveler
Freq.
Cluster
percent
Freq.
Cluster
percent
410
1114
1524
37.4
48.2
44.8
685
1196
1881
62.6
51.8
55.2
Table 8-8 Domestic travel proclivity by cluster
Cluster 1
Cluster 2
Combined
Infrequent traveler
Frequent traveler
Freq.
Cluster
percent
Freq.
Cluster
percent
230
607
837
21.0
26.3
24.6
865
1703
2568
79.0
73.7
75.4
199
200
Data Mining and Market Intelligence
In terms of Internet and cell phone usage, cluster two shows a much
heavier usage of Internet and cell phone than cluster one (Table 8-9 and
Table 8-10).
Table 8-9 Internet usage by cluster
Heavy user
Cluster 1
Cluster 2
Combined
Medium user
Light user
Freq.
Cluster
percent
Freq.
Cluster
percent
Freq.
Cluster
percent
265
1730
1995
24.2
74.9
58.6
255
401
656
23.4
17.4
25.1
575
179
754
52.5
7.8
22.1
Table 8-10 Cell phone usage by cluster
Heavy user
Cluster 1
Cluster 2
Combined
Medium user
Light user
Freq.
Cluster
percent
Freq.
Cluster
percent
Freq.
Cluster
percent
559
2282
2841
51.1
98.8
83.4
430
28
458
39.3
1.2
13.5
106
0
106
9.6
0.0
3.1
Table 8-11 summarizes the characteristics of the two clusters.
We can describe cluster one as the ‘Mass’ cluster and cluster two as the
‘Upscale’ cluster. Travel Wind can utilize the clustering analysis results
and the following insight for effective targeting marketing.
●
●
There are significant opportunities for selling travel vacation packages
to the existing member base since over half of its members are either
frequent foreign or domestic travelers.
Travel Wind may consider launching two different sets of creative and
messaging to the two distinct clusters based on income and age information. The members from the upscale cluster (cluster two) are likely
to be either professionals at the peak of their careers or wealthy homemakers. The members of the mass cluster (cluster one) are likely to be
retirees looking for a unique travel experience.
Audience Segmentation
Table 8-11 Summary of cluster characteristics for case study one (model
building stage)
Cluster number
and description
Demographics
Travel
propensity
Internet and cell
phone usage
Cluster 1 (Mass)
Education: High
concentration
of high school
and college
Frequent
foreign
travelers: 63%
Internet usage:
Light
Household
income:
Medium
Frequent
domestic
travelers: 79%
Cell phone
usage: High to
medium
Average age: 68
Presence of
children: Low
Cluster 2
(Upscale)
Education: High
concentration
of college
and graduate
schools
Household
income: High
Frequent
foreign
travelers: 52%
Frequent
domestic
travelers: 74%
Internet usage:
High
Cell phone
usage: High
Average age: 51
Presence
of children:
Medium
●
●
In terms of vacation package pricing, Travel Wind may offer higherpriced packages to the upscale cluster and less pricey packages to the
mass cluster.
Travel Wind can leverage Internet and mobile marketing channels to
reach the upscale cluster given that the members of this cluster tend to
be heavy users of the two technologies.
Model validation
Travel Wind needs to validate the cluster model with the remaining 30%
of the sample (1492 members). Three out of the 1492 records could not be
201
202
Data Mining and Market Intelligence
classified into either cluster and are dropped out from the analysis. The
net number of members classified into either cluster is 1489. The validation result is very similar to the result from model building. Therefore, we
conclude that the clusters are very stable and robust. Table 8-12 shows the
AIC statistic of the validation set.
Table 8-12 AIC criterion for determining the optimal number of clusters
for case study one (model validation stage)
Number of
clusters
Akaike’s
Information
Criterion (AIC)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
25538.570
22987.475
21914.764
21156.948
20497.376
19859.790
19363.405
19042.244
18744.128
18447.974
18166.176
17888.133
17676.206
17484.113
17314.920
AIC change
(a)
Ratio of AIC
changes (b)
Ratio of
distance
measures (c)
2551.095
1072.711
757.816
659.572
637.586
496.385
321.161
298.115
296.154
281.798
278.044
211.927
192.093
169.193
1.000
0.420
0.297
0.259
0.250
0.195
0.126
0.117
0.116
0.110
0.109
0.083
0.075
0.066
2.291
1.379
1.134
1.031
1.248
1.446
1.062
1.005
1.041
1.011
1.233
1.075
1.095
1.002
(a) The changes are from the previous number of clusters in the table. (b) The ratios
of changes are relative to the change for the two-cluster solution. (c) The ratios of
distance measures are based on the current number of clusters against the previous
number of clusters. (d) Approximately 30% of cases (SAMPLE) Selected.
Table 8-13 shows that 37% of the members are classified in cluster one
and the remaining 63% of the members are classified in cluster two. The
breakdown is similar to that of the modeling set.
The statistics in Table 8-14 indicate that the average household head age
in cluster one (67 years old) is higher than that in cluster two (51 years
old). The overall average age of the member base is 56.
Table 8-15 presents the statistics of education level by cluster. Members
in cluster two are better educated than in cluster one since 45% of the
Audience Segmentation
Table 8-13 Cluster breakdown
Cluster 1
Cluster 2
N
% of Combined
557
932
37.4
62.6
Table 8-14 Average household head age by cluster
Cluster 1
Cluster 2
Combined
Mean of age
Standard deviation of age
66.65
50.53
56.56
13.909
9.484
13.764
Table 8-15 Education level distribution by cluster
High school
College
Graduate
degree
Vocational
school
Freq. Cluster Freq. Cluster Freq. Cluster Freq. Cluster
percent
percent
percent
percent
Cluster 1
184
Cluster 2
153
Combined 336
33.0
16.4
22.6
231
356
587
41.5
38.2
39.4
135
415
550
24.2
44.5
36.9
7
8
15
1.3
0.9
1.0
cluster two members have a graduate school degree compared to 24% in
cluster one.
The percentage of members with children in the household is
much higher in cluster two (46%) than in cluster one (14%) as shown in
Table 8-16.
Based on Table 8-17, cluster two is a wealthier group than cluster one.
Among cluster two members, 73% have an annual household income of
over $100,000 while only 24% of the cluster one members have the same
household income.
In terms of travel proclivity, Table 8-19 illustrates that cluster one has
a slightly higher percentage of frequent foreign travelers (58%) than
203
204
Data Mining and Market Intelligence
Table 8-16 Presence of children in household
Unknown
Cluster 1
Cluster 2
Combined
Without
children
With at least
one child
Freq.
Cluster
percent
Freq.
Cluster
percent
Freq.
Cluster
percent
77
25
102
13.8
2.7
6.9
400
479
879
71.8
51.4
59.0
80
428
508
14.4
45.9
34.1
Table 8-17 Household income distribution by cluster
Less than $50 K
Cluster 1
Cluster 2
Combined
$50 K–$99,999
$100 K or higher
Freq.
Cluster
percent
Freq.
Cluster
percent
Freq.
Cluster
percent
123
3
126
22.1
0.3
8.5
302
253
555
54.2
27.2
37.3
132
676
808
23.7
72.5
54.3
Table 8-18 Domestic travel behavior by cluster
Infrequent traveler
Cluster 1
Cluster 2
Combined
Frequent traveler
Freq.
Cluster
percent
Freq.
Cluster
percent
138
223
361
24.8
23.9
24.2
419
709
1128
75.2
76.1
75.8
cluster two (55%). According to Table 8-18, there is no significant difference in domestic travel proclivity between cluster one (75%) and cluster
two (76%).
With regard to Internet and cell phone usage (Table 8-20 and Table 8-21),
cluster two shows heavier use of these technologies than cluster one.
Table 8-22 summarizes the characteristics of the two clusters, cluster
number and description.
Audience Segmentation
Table 8-19 Foreign travel behavior by cluster
Infrequent traveler
Cluster 1
Cluster 2
Combined
Frequent traveler
Freq.
Cluster
percent
Freq.
Cluster
percent
237
416
653
42.6
44.6
43.9
320
516
836
57.5
55.4
56.2
Table 8-20 Internet usage by cluster
Cluster 1
Cluster 2
Combined
Freq.
Cluster
percent
Freq.
Cluster
percent
Freq.
Cluster
percent
160
718
878
28.7
77.0
59.0
133
150
283
23.9
16.1
19.0
264
64
328
47.4
6.9
22.0
Table 8-21 Cell phone usage by cluster
Heavy user
Cluster 1
Cluster 2
Combined
Medium user
Light user
Freq.
Cluster
percent
Freq.
Cluster
percent
Freq.
Cluster
percent
302
931
1233
54.2
99.9
82.8
205
1
206
36.8
0.1
13.8
50
0
50
9.0
0.0
3.4
The results of the model validation phase are similar to the results of
the model building phase. This is an indication that the model is fairly
robust and stable.
■ Case study two: value
segmentation
E-commerce firm Global Village needs to analyze its existing customer
base to better understand how customer value is distributed across the
205
206
Data Mining and Market Intelligence
Table 8-22 Summary of cluster characteristics for case study one (model
validation stage)
Cluster number
and description
Demographics
Travel
propensity
Internet and cell
phone usage
Cluster 1 (Mass)
Education: High
concentration
of high school
and college
Frequent
foreign
travelers: 58%
Internet usage:
Light
Household
income:
Medium
Frequent
domestic
travelers: 75%
Cell phone
usage: High to
medium
Average age: 67
Presence of
children: Low
Cluster 2
(Upscale)
Education:
High concentration of
college and
graduate
schools
Household
income: High
Frequent
foreign
travelers:
55%
Internet
usage: High
Cell phone
usage: High
Frequent
domestic
travelers:
76%
Average age:
51
Presence
of children:
Medium
customer base. Customer value is defined as the customer ’s annual purchase dollar amount of Global Village products and is the dependent variable for the analysis. There are four independent variables in the analysis:
household income, age, gender, and marital status.
A total of 60,510 customers are included in the analysis. Seventy percent of the data (42,315 customers) is used for model building and 30%
(18,195 customers) is used for validation.
The C & RT module of the SPSS software is used for the analysis. Based
on the CART technique introduced in Chapter 7, C & RT partitions a
group of members to create homogeneous subgroups by maximizing a
Audience Segmentation
metric called ‘improvement’ at each split. This metric is defined by the
decrease in least squares (LS) from the parent node to its child nodes
divided by the total number of members in the data set.
Model building
There are 42,315 customers in the model building data set. Household
income is identified as an independent variable that is highly associated
with purchases. Five clusters are created as a result of the analysis, as
illustrated in Figure 8-1. The higher the household income of a customer,
the more likely he is to purchase from Global Village.
Node 0
Total: 42,315
Mean annual purchase
Amount: $1326.8
Improvement 1,763,200
Income $200K
Node 1
Total: 35,533
Mean annual purchase
Amount: $746.7
Income
$200K
Node 2
Total: 6782
Mean annual purchase
Amount: $4366.2
Improvement 87,091
Income $100K
Income
Improvement 7383
Improvement 21,856
Income $75K
Node 5
Total: 16,156
Mean annual purchase
Amount: $394.7
$100K
Node 4
Total: 10,602
Mean annual purchase
Amount: $1240.5
Node 3
Total: 24,931
Mean annual purchase
Amount: $536.7
Income
Node 6
Total: 8775
Mean annual purchase
Amount: $798.0
$75K
Income $150K
Node 7
Total: 5851
Mean annual purchase
Amount: $1085.8
Income
$150K
Node 8
Total: 4751
Mean annual purchase
Amount: $1431.0
Figure 8-1
Decision tree output for case study two (model building stage).
To demonstrate how the metric ‘improvement’ is derived, we focus on
node zero as a parent node and nodes one and two as its child nodes.
LS in node zero is
i= 42 , 315
∑
i=1
(Xi 1326.793)2 = 5, 250 , 280 , 769, 315
207
208
Data Mining and Market Intelligence
LS in node two is
i=6782
∑
i=1
(Xi 4366.209)2 = 4 , 762, 754 , 569, 013
LS in node one is
i= 35 , 533
∑
i=1
(Xi 746.676)2 = 412, 915, 613 , 804
The difference in LS between node zero and node one and node two
combined is
5, 250 , 280 , 769, 315 − ( 4 , 762, 754 , 569, 013 + 412, 915, 613 , 804)
= 74 , 610 , 586 , 498
Improvement
as
the
average
74 , 610 , 586 , 498
LS =
= 1, 763 , 218
42, 315
of
the
difference
in
The SPSS C & RT module continues to create nodes with maximum
improvement in LS and the resulting output is illustrated in Figure 8-1.
Model validation
There are 18,195 customers in the model validation data set. Figure 8-2
shows the results of the validation analysis. The validation results are
consistent with the model results, which suggest that the segmentation is
fairly stable and robust.
■ Case study three: response
behavior segmentation
Direct marketing firm Wonder Electronics has just completed a direct
mail campaign and wants to segment the target audience based on their
responsiveness. Wonder Electronics will leverage the analysis for its
future customer acquisition programs.
In addition to response data, four other attributes are collected and
used in the analysis: marital status, presence of children, household
income, and dwelling type (single family homes or multiple-unit buildings). A total of 10,435 prospects are included in the analysis. Seventy
percent of the data (7279 prospects) is used for model building and 30%
(3156 prospects) is used for validation.
Audience Segmentation
Node 0
Total: 18,195
Mean annual purchase
Amount: $1326.8
Income $200K
Node 1
Total: 15,274
Mean annual purchase
Amount: $758.5
Income
Income $100K
Income
Income $75K
$100K
Node 4
Total: 4462
Mean annual purchase
Amount: $1208.3
Node 3
Total: 10,812
Mean annual purchase
Amount: $572.8
Node 5
Total: 7086
Mean annual purchase
Amount: $408.4
$200K
Node 2
Total: 2921
Mean annual purchase
Amount: $4039.0
Income
Node 6
Total: 3726
Mean annual purchase
Amount: $885.5
$75K
Income $150K
Node 7
Total: 2448
Mean annual purchase
Amount: $1057.7
Income
$150K
Node 8
Total: 2014
Mean annual purchase
Amount: $1391.4
Figure 8-2
Decision tree output for case study two (model validation stage).
The analysis is conducted with the CHAID module of the SPSS
Classification Trees software. CHAID uses the chi square statistics. The significance level (p value) is used to determine if a split of a node in the tree
is statistically significant.
In the current example, the dependent variable (responsiveness to the
direct marketing campaign) has a nonmetric (more specifically, binary)
data type. Therefore, the chi square statistics is used to determine the statistical significance of splits.
Model building
Figure 8-3 shows the SPSS output of the tree corresponding to the model
building set. Three clusters are created and the overall response rate is
7.8%. The cluster (node two) with nonsingle prospects (married or with
unknown marital status) have the highest response rate (13.9%), followed
by the cluster (node four) of single prospects without children (response
209
210
Data Mining and Market Intelligence
rate 8.8%), and the cluster (node three) of single prospects with children
(response rate 5.8%.)
Node 0
Total: 7279
# Responders: 566
Response rate: 7.8%
Chi square 36.3,
df1 1, p 0.00
Single
Marital status
Married or
unknown
Node 1
Total: 6646
# Responders: 478
Response rate: 7.2%
Node 2
Total: 633
# Responders: 88
Response rate: 13.9%
Chi square 21.7,
df1 1, p 0.00
With children
Node 3
Total: 3531
# Responders: 205
Response rate: 5.8%
Without children
Node 4
Total: 3115
# Responders: 273
Response rate: 8.8%
Figure 8.3
CHAID output for case study three (model building stage).
Validation
The result of the validation set is consistent with that of the model building set, which suggests that the CHAID segmentation model is stable and
robust (Figure 8-4).
■ Case study four: customer
satisfaction segmentation
Consumer electronics manufacturer Versatile Electronics conducts a customer satisfaction survey to collect overall satisfaction ratings on the various features of its latest camcorder model. There are four rating levels:
four (excellent), three (good), two (fair), and one (poor).
Data on a total of 14,951 responses is collected. Seventy percent (10,416
responders) of the data is used to build a discriminant model and the
Audience Segmentation
Node 0
Total: 3156
# Responders: 257
Response rate: 8.1%
Single
Marital status
Married or
unknown
Node 1
Total: 2872
# Responders: 218
Response rate: 7.6%
With children
Node 3
Total: 1549
# Responders: 103
Response rate: 6.7%
Node 2
Total: 284
# Responders: 39
Response rate: 13.7%
Without children
Node 4
Total: 1323
# Responders: 115
Response rate: 8.7%
Figure 8-4
CHAID output for case study three (model validation stage).
remaining 30% (4535 responders) of the data is used for validation. Every
respondent is asked to rate his overall satisfaction of the product and his
level of satisfaction of the following 11 attributes.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
Quality
Price
Service
Feature A
Feature B
Feature C
Feature D
Feature E
Feature F
Feature G
Feature H
The overall satisfaction rating of the product is the dependent variable.
The satisfaction ratings on the 11 attributes are the discriminating variables. Discriminant analysis will derive the discriminant functions, represented by linear combinations of the 11 discriminating variables.
211
212
Data Mining and Market Intelligence
Model building
A sample of 10,416 responses is used for model building. Among them, 3158
are dropped due to incomplete data; as a result the net number of responders included in the analysis is reduced to 7258. Each of these respondents
belongs to one of the four groups depending on his level of overall satisfaction of the product, excellent, good, fair, and poor. The F statistics illustrated
in Table 8-23 are statistically significant (p value 0.05), which indicates that
there are differences in the means of satisfaction rating of the 11 attributes
between the four groups. Therefore, all the 11 attributes are selected as the
discriminating variables for constructing discriminant functions.
Table 8-23 Equality test of the satisfaction ratings group means for
the 11 attributes in case study two (model building stage)
Feature
rated
Wilks’
lambda
F
df 1
df 2
Sig.
Quality
Price
Feature A
Feature B
Service
Feature C
Feature D
Feature E
Feature F
Feature G
Feature H
0.565
0.746
0.628
0.596
0.562
0.623
0.607
0.617
0.598
0.612
0.806
1859.020
824.277
1430.819
1636.679
1880.780
1465.850
1568.744
1498.611
1625.951
1530.941
583.153
3
3
3
3
3
3
3
3
3
3
3
7254
7254
7254
7254
7254
7254
7254
7254
7254
7254
7254
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
We create three discriminant functions for differentiating the four
groups with different levels of overall product satisfaction, based on
Wilks’ lambda (Dillon and Goldstein 1984) in Table 8-24.
Table 8-24 Statistical significance of discriminant functions
Test of
function(s)
Wilks’
lambda
Chi-square
df
Sig.
1–3
2–3
3
0.327
0.958
0.987
8111.441
312.259
92.618
33
20
9
0.000
0.000
0.000
Audience Segmentation
Table 8-25 presents the three corresponding eigenvalues, which quantify the percent of total variance explained by particular discriminant
functions.
Table 8-25 Discriminant functions and their corresponding eigenvalues
Function(s)
Eigenvalue
% of
Variance
Cumulative %
Canonical
correlation
1
2
3
1.932
0.031
0.013
97.8
1.6
0.7
97.8
99.3
100.0
0.812
0.173
0.113
Table 8-26 presents the discriminant coefficients associated with each
discriminant function and discriminating variable.
Table 8-26 Discriminant functions, discriminating variables, and discri-
minant coefficients
Feature rated
Function 1
Function 2
Function 3
Quality
Price
Feature A
Feature B
Service
Feature C
Feature D
Feature E
Feature F
Feature G
Feature H
0.311
0.042
0.189
0.159
0.169
0.093
0.093
0.108
0.237
0.245
0.106
0.055
0.160
0.139
0.046
0.161
0.225
0.020
0.063
0.143
1.041
0.331
0.433
0.059
0.629
0.085
0.438
0.018
0.516
0.070
0.507
0.162
0.387
We can visualize the distribution of the group members in a twodimensional plot with the first two discriminant functions as its axes in
Figure 8-5.
Validation
A sample of 4535 responses is used for validation. Among them, 1372 are
dropped due to incomplete data so the net number of responses included
213
Data Mining and Market Intelligence
8
6
4
Function 2
214
2
Poor
Fair
0
Good
Excellent
0
2
2
4
8
6
4
2
Function 1
Rating_overall
Poor
Fair
Good
Excellent
Group centroid
Figure 8-5
Group distribution by discriminant functions for case study four
(model building stage).
in the validation analysis is 3163. As Table 8-27 shows, the means of the 11
attributes are statistically significant between the four groups. Table 8-28
demonstrates the statistical significance of three discriminant functions.
We can see that the results are similar to those from the model building
step. Therefore, we conclude that the discriminant analysis is both stable
and robust.
The eigenvalues corresponding to the discriminant functions quantify
the percentage of total variance explained by these discriminant functions
as illustrated by Table 8-29.
The discriminant coefficients associated with the 11 discriminating variables for the top three discriminant functions are illustrated in Table 8-30.
Audience Segmentation
Table 8-27 Equality test of the satisfaction ratings group means for the
11 attributes in case study two (model validation stage.)
Feature rated
Wilks’
lambda
F
df 1
df 2
Sig.
Quality
Price
Feature A
Feature B
Service
Feature C
Feature D
Feature E
Feature F
Feature G
Feature H
0.543
0.712
0.617
0.564
0.523
0.577
0.577
0.584
0.601
0.604
0.812
887.015
425.810
653.768
813.871
959.846
772.547
770.958
750.739
698.160
691.480
243.601
3
3
3
3
3
3
3
3
3
3
3
3159
3159
3159
3159
3159
3159
3159
3159
3159
3159
3159
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
Table 8-28 Statistical significance of discriminant functions
Test of
function(s)
Wilks’
lambda
Chi-square
df
Sig.
1–3
2–3
3
0.299
0.943
0.989
3812.626
184.589
36.388
33
20
9
0.000
0.000
0.000
Table 8-29 Discriminant functions and the corresponding eigenvalues
Function(s)
Eigenvalue
% of
Variance
Cumulative %
Canonical
correlation
1
2
3
2.159
0.048
0.012
97.3
2.2
0.5
97.3
99.5
100.0
0.827
0.214
0.107
By plotting the group members on a two-dimensional plot with the
first two discriminant functions as its two axes, we can see the clear distinction between the four groups in Figure 8-6.
215
Table 8-30 Discriminant functions, discriminating variables, and dis-
criminant coefficients
Feature rated
Function 1
Function 2
Function 3
Quality
Price
Feature A
Feature B
Service
Feature C
Feature D
Feature E
Feature F
Feature G
Feature H
0.307
0.081
0.158
0.166
0.208
0.120
0.095
0.104
0.188
0.255
0.092
0.367
0.034
0.203
0.298
0.025
0.089
0.033
0.084
0.049
0.839
0.053
0.195
0.096
0.684
0.080
0.440
0.373
0.378
0.140
0.525
0.299
0.006
Function 2
2.5
0.0
Poor
Good
Fair
Excellent
2.5
5.0
7.5
5.0
2.5
0.0
2.5
Function 1
Rating_overall
Poor
Fair
Good
Excellent
Group centroid
Figure 8-6
Group distribution by discriminant functions for case study four
(model validation stage).
Audience Segmentation
The result of the validation set is consistent with that of the model building set, which suggests that the discriminant model is stable and robust.
■ References
Dillon, W., and M. Goldstein. Multivariate Analysis – Methods and Applications. John
Wiley & Sons, New York, 1984.
Akaike, H. A new look at the statistical model identification. IEEE Transactions on
Automatic Control, New York, 19(6): 716–723, 1974.
217
This page intentionally left blank
CHAPTER 9
Data Mining
for Customer
Acquisition,
Retention,
and Growth
This page intentionally left blank
This chapter presents three case studies that demonstrate the application
of data mining techniques to acquiring new customers as well as growing
and retaining existing customers. We apply to these case studies logistic
regression, a technique that can be implemented by a variety of tools such
as SAS, IBM Intelligent Miner, Knowledge STUDIO, and SPSS.
■ Case study one: direct mail
targeting for new customer
acquisition
Catalog marketer Mountaineer needs to mail its new holiday catalogs to
a list of prospects. The firm plans to analyze the result of its last holiday
mailing and apply that learning to the upcoming mailing. The last catalog
mailing by Mountaineer was made to a total of 52,000 prospects, which
resulted in a purchase rate of 1.2%, or 624 purchases. At the time of the
last mailing, Mountaineer also defined a control group of 52,000 prospects that did not receive a catalog. The control group yielded a purchase
rate of 0.5%, 260 purchases. The last catalog generated a lift of 140% in
purchase rate from 0.5% to 1.2%. There are four distinct segments in the
mailing group and the control group combined.
●
●
●
●
Prospects who made a purchase after receiving a catalog
Prospects who did not make a purchase after receiving a catalog
Prospects who made a purchase without receiving a catalog
Prospects who did not receive a catalog and did not make a purchase.
Obviously, some prospects made a purchase regardless of whether
they received a catalog. In contrast, some prospects made a purchase only
after receiving a catalog. Two probabilities need to be calibrated to fully
understand a prospect’s purchase behavior. These two probabilities are
the probability of making a purchase after receiving a catalog, and the
probability of making a purchase without receiving a catalog. The most
ideal prospects for targeting are those with a very low probability of making a purchase when not receiving a catalog, but with a very high probability of making a purchase after receiving a catalog. The objective of a
targeting model is to identify and select prospects with the highest difference in these two probabilities. This train of thought leads to a twomodel approach, one model based on a sample of prospects not receiving
a catalog, and the other model based on a sample of prospects receiving
222
Data Mining and Market Intelligence
a catalog. This approach is referred to as the incremental purchase
modeling approach. Models created with this approach follow the standard model building process comprising building, validation, and test
phases. For simplicity, this chapter only presents the final results of the
models without further elaboration on the validation and test phases.
Purchase model on prospects having
received a catalog
Mountaineer utilizes the purchase data of 5790 prospects targeted by the
last catalog for building a purchase model. Data of 4053 prospects (70%
of the data) is used for modeling and the remaining 30% used for model
validation. The dependent variable is a purchase flag indicating the existence of any purchase by a particular prospect. The value of the purchase
flag is one, if any purchase has happened, and zero otherwise. Four prospect attributes are used as the independent variables.
●
●
●
●
Age: The five age groups are under 18, 18–24, 25–34, 35–44, and 45–54.
Household income: There are six income groups: $0–50 K, $50 –75 K,
$75 –100 K, $100 –125 K, $125 –150 K, and over $150 K.
Home ownership: The variable has a binary value. A value of one
indicates home ownership and zero lack of home ownership.
Not interest in outdoors: Prospects who are interested in outdoors
have a value of zero and prospects not interested in outdoors have a
value of one for this variable.
The independent variables are all categorical variables and therefore
they need to be transformed into binary variables prior to being fed into a
logistic regression model.
The purchase model is built with the backward stepwise logistic regression method. The backward stepwise logistic regression method employs
a combination of the backward removal and the forward entry methods
(Neter, Wasserman, and Kutner 1990). In the analysis of Mountaineer
data, the criterion for variables to exit the model in the backward removal
phase is 0.10 and to enter the model in the forward entry phase 0.05
respectively.
Table 9-1 shows that three out of the four independent variables are
predictive of purchase behavior. These three predictive independent variables are age, household income, and interest in the outdoors. Prospects
with the following characteristic are more likely to make a purchase when
receiving a catalog.
Data Mining for Customer Acquisition, Retention
Table 9-1 Model parameter estimates for case study one (model build-
ing stage.)
Parameter estimate
Intercept
Dummy variable
Age: 18–24
Age: Under 18
Age: 35–44
Age: 25–34
Income: $75–100 K
Income: $50–75 K
Income: $100–125 K
Income: Less than $50 K
Income: $125–150 K
Not interested in
outdoors
p Value
Exponential
of parameter
estimate
1.885
0.000
6.584
1.304
0.191
0.115
0.588
0.246
0.896
0.511
2.437
0.743
1.057
0.000
0.332
0.429
0.000
0.106
0.000
0.023
0.000
0.000
0.000
0.271
0.826
1.122
0.556
0.782
0.408
0.600
0.087
0.475
0.348
●
Age between 35 and 44
The probability of a prospect making a purchase after receiving a
catalog is
p
exp(L)
1 exp(L)
(9.1)
where
L 1.885 1.304 Age18−24 0.191 Ageunder 18 0 .115 Age35−44
0.588 Age 25 – 34 0.246 Income75 – 100 K 0.896 Income50 – 75 K
0.511 I ncome100−125 K 2.437 Income 50 K
0.743 Income125−150 K 1.057 Outdoors no interest
The value of each of the variables on the right hand side of the equation
is one when the condition indicated by the subscript holds and zero otherwise. For instance the value of the variable Age18–24 is one when the age
of a prospect is between 18 and 24, and zero otherwise. A positive parameter indicates a higher probability of making a purchase when the condition indicated by the subscript is true. Prospects aged between 35 and
44 are more likely to make a purchase than those aged over 44 or under
223
224
Data Mining and Market Intelligence
35 because the parameter associated with the 35–44 age group is positive.
The estimated probability that a prospect aged between 18 and 24, with a
household income between $75 K and $100 K, and with an interest in outdoors makes a purchase after receiving a catalog is
pcatalog exp(1.885 1.304 0.246)
0.5825
1 exp(1.885 1.304 0.246)
(9.2)
When a purchase model is used to predict purchase behavior, it is a
common practice to classify a prospect as a potential buyer if his probability of purchase is greater than 0.5 (50%) and as a potential non-buyer
if his probability of purchase is less than 0.5 (50%). A probability of 0.5
indicates uncertain purchase behavior. Thus, the model predicts that
a prospect aged between 18 and 24, with a household income between
$75 K and $100 K, and with an interest in the outdoors is likely to make a
purchase when targeted by a catalog.
Purchase model based on prospects not having
received a catalog
We next build the second purchase model by utilizing a sample of 9760
prospects who did not receive Mountaineer ’s last catalog. The same independent and dependent variables used in the previous section are used in
building the second purchase model.
We conduct the analysis with the backward stepwise option of the
logistic regression module of SPSS. The statistical significance criterion for
variables to exit the model and to entry the model is 0.10 and 0.05. Table
9-2 shows that all four independent variables are predictive of purchase
behavior. Prospects with the following characteristics are more likely to
make a purchase when not receiving a catalog.
●
●
●
●
Age between 45 and 54 or under 18
Income less than $50K or between $75K and $125K
With home ownership
Not interested in outdoors.
The probability of a prospect making a purchase without receiving a
catalog is
pno catalog exp(L)
1 exp(L)
(9.3)
Data Mining for Customer Acquisition, Retention
Table 9-2 Model parameter estimates for case study one (model valida-
tion stage.)
Intercept
Dummy variable
Age 45–54
Age 35–44
Age 25–34
Age Under 18
Income $75 –100 K
Income $50 –75 K
Income $100 –125 K
Income Less than $50 K
No home ownership
Not interested in
outdoors
Parameter estimate
p value
Exponential
of parameter
estimate
2.934
0.000
0.053
0.766
0.074
0.100
0.520
0.734
0.327
1.416
0.974
0.411
2.441
0.004
0.780
0.717
0.074
0.000
0.047
0.000
0.011
0.000
0.000
2.152
1.077
0.905
1.682
2.084
1.387
4.122
2.648
0.663
11.487
with
L 2.934 0.766 Age 45−54 0.074 Age35−44 0.100 Age25−34
0.520 Ageunder 18 0.734 Income75−100 K 0.327 Income50−75 K
1.416 × Income100−125 K 0.974 Income 50 K
0.411 Homeno home 2.441 Ou tdoors no interest
As an application of this model estimation, consider the probability that
a prospect aged between 18 and 24, with a household income between
$75 K and $100 K, and with an interest in outdoors makes a purchase
without receiving a catalog.
pno catalog exp(2.934 0.734)
0.0998
1 exp(2.934 0.734)
(9.4)
Based on Eqs. (9.2) and (9.4), we conclude that prospects aged between
18 and 24, with a household income between $75 K and $100 K, and with
an interest in outdoors are not likely to make a purchase without receiving
a catalog but become likely to make a purchase after receiving a catalog.
225
226
Data Mining and Market Intelligence
Prospect scoring
So far Mountaineer has built two targeting purchase models. The first
model is used to predict the probability of purchase if targeted by a catalog. The second model is used to predict probability of purchase when
not mailed a catalog. Mountaineer will apply these two models to select
its targeted mailing list.
Mountaineer acquires a list of 100,000 potential prospects for the next
catalog mailing and has a budget for targeting only 20,000 of them.
The firm also plans to set aside a control group of 20,000 prospects that
will not receive a catalog. To effectively select the prospect target list,
Mountaineer undertakes the following steps.
1. Randomly select 20,000 prospects for the control group.
2. Score the remaining 80,000 prospects with the first purchase model to
compute their expected purchase probability if targeted by a catalog.
These prospects then are classified as buyers and non-buyers depending on whether their purchase probability exceeds the 0.5 threshold.
3. Score the same 80,000 prospects with the second purchase model to
calculate their probability of making a purchase if not targeted by a
catalog. The same probability threshold of 0.5 is used to classify the
prospects as buyers and non-buyers when not targeted.
4. Select the prospects predicted to be buyers by the first purchase model
and to be non-buyers by the second purchase model.
5. Select 20,000 prospects resulted from the previous step with the highest
probabilities of making a purchase when targeted by a catalog.
Modeling financial impact
This section illustrates the financial impact of purchase models based on
the following assumptions of Mountaineer.
●
●
●
●
Cost per catalog (cost of catalog production and mailing postage): $0.5
Average purchase amount per buyer: $80
Average cost of goods and shipment: $50
Expected purchase rate of the mailing of 20,000 prospects: 3.0%
The net revenue from the mailing group is equal to
Total sales Total costs $80 (20 , 000 3%) $50 (20 , 000 3%)
$0.5 20 , 000 $8 , 000
Data Mining for Customer Acquisition, Retention
The return on investment of the mailing group is
Net revenue
$8000
20%
Total cost
$40 , 000
By applying purchase models to improve the purchase rate of the mailing group, Mountaineer can generate a 20% return on investment.
■ Case study two: attrition modeling
for customer retention
Credit lender UNC Bank has lately observed a high attrition rate among
its small business customers. UNC plans to build an attrition model to
predict future attrition behavior with the intention to timely detect customers who are likely to defect.
UNC has collected six variables (company size, headquarters location, industry, number of years since establishment, annual sales volume,
and number of offices) about its small business customers. These are the
independent variables. The dependent variable is a binary attrition flag
with a value of one indicating attrition, and zero otherwise. UNC utilizes
the historical data of 47,056 customers for building an attrition model.
Data on 32,939 prospects (70% of the data) is used for modeling and the
remaining 30% used for model validation.
The attrition model is created with the stepwise logistic regression
module in SPSS. The statistical significance criterion for variables to exit
the model and to entry the model is 0.10 and 0.05. Customers with the
following characteristics are more prone to attrition.
●
●
●
●
Customer headquarter located in states NV, PR, VI, AR, DC, AL, or FL
Customers in one of the following industries: apparel, manufacturing,
insurance, services, auto repair, restaurants, food catering, transportation, mortgage, or travel
Company size (number of employees) smaller than 250
Company annual revenue between $10 M and $50 M.
Based on a logistic regression analysis, the probability of a customer
canceling his credit account is
pattrition exp(L)
1 exp(L)
227
228
Data Mining and Market Intelligence
The SPSS model output gives the following construction for L
L 1.823 0.370 StateCA 0.281 StateAZ 0.321 StateCT
0.234 St a teOH 0.586 StateNM 0.157 StateNY 0.527
StateRI 0.005 Sta teGA 0.557 StateNV 0.439 StatePA
0.364 StatePA 0.045 Stat ePR 0.295 StateDE 0.216
State VA 0.360 StateWA 0.132 Sta t e VI 0.085 StateOR
0.115 StateAR 0.413 State DC 0.640 Stat eAL 1.324
StateFL 0.710 Industry apparel 0.029 Industry banking
0.217 Industry manufacturing 0.029 Industry insurance
0.018 Industry services 0.084 Industry auto repairs
0.286 Industry restaurant 0.054 Industry education
0.798 Indu stry food catering 0.123 Industry construction
0.153 Indu stry transportation 0.128 Industry mortgage
0.191 Industry travel 0.467 Industry legal
0.682 Number of years in busines s 0.049 Size 250
0.135 Size[100 , 249] 0.209 Size[ 50 , 99] 0.290 Size[ 20 , 49]
0.199 Size[1, 19] 0.3642 Regional 0.275 Nat ional
0.021 Sales($25 M ,$50 M]
0.118 Sales($10 M ,$25 M] 0.167 Sales($5 M ,$10 M]
0.531 Sales($1 M ,$5 M] 0.849 Sales
$1 M
(9.5)
As an application of Eq. (9.5), consider a company in the apparel business is located in Florida, has a size of 250 employees, and has annual
sales of $50 million. Replacing these values in Eq. (9.5), and assuming that
the remaining dummy variables take on the value 0, we get the following
probability of attrition
pattrition exp(L)
exp(1.823 1.324 0.710 0.049 0.0211)
0.98
1 exp(L) 1 exp(1.823 1.324 0.710 0.049 0.021)
UNC bank can leverage the above attrition model to predict which customers are likely to cancel their credit lines. The bank can salvage these
customers by offering them incentives such as favorable credit terms.
Data Mining for Customer Acquisition, Retention
■ Case study three: customer
growth model
Companies can leverage customer growth models to increase sales to
their existing customers. Customer growth models predict the probability that a customer will grow his purchase amount over time. In practice,
most companies focus on customers likely to have significant growth
rather than marginal growth in their purchase amount. A growth model is
built on historical data where the dependent variable is a binary growth
flag with a value of one or zero. A customer with a growth flag of one is
a customer that grew his purchase amount above a pre-determined percentage during a specific time frame. This pre-determined percentage is
set arbitrarily by a company. A customer with a growth flag of zero is one
whose growth in purchase amount does not meet this pre-determined
growth percentage. The following section discusses a case study where
stepwise logistic regression is applied to build a customer growth model.
Insurance company, Safe Net, plans to pilot a comprehensive insurance
package that allows policyholders to combine several types of insurance
policies (health, auto, life, accidental, and home) into one policy with a
fixed premium. This new product will allow the company to minimize
operational costs and to increase revenues by cross-selling additional
insurance coverage to existing policyholders. This new product will also
benefit policyholders in that it will make it easy for them to manage multiple policies and to enjoy potential discounts. In order to take advantage
of the highest revenue potential, Safe Net decides to target those policyholders that are likely to grow their insurance purchases with the firm.
With this objective in mind, Safe Net needs to build a growth model to
predict each customer ’s probability of increasing his insurance purchase.
Safe Net defines a growth policyholder as one that increases his insurance purchase by at least 5% over the past three years. The firm draws a
random sample of 95,953 policyholders. Data on 70% of the policyholders, or 66,915 of them are used for model building and the remaining 30%
used for validation. The following five policyholder attributes are used as
independent variables.
●
●
●
●
●
Family annual income
Residence state
Profession
Number of members in the family
Policyholder age.
229
230
Data Mining and Market Intelligence
The analytic result shows that the probability of growth at policyholder
level is
pgrowth
5%
exp(L)
1 exp(L)
where
L 1.770 2.292 StateCA 2.230 StateAZ 2.264 StateCT
2.161 S tateOH 2.121 StateNM 1.946 StateNY
1.997 StateRI 1.972 S tateGA 2.097 StateNV
2.132 StatePA 2.054 StateMA 2.115 S tatePR
2.175 StateDE 2.067 State VA 2.263 StateWA
2.075 S tate VI 2.157 StateOR 2.060 StateAR
2.287 StateDC 2.097 S tateAL 2.909 StateFL
0.152 Industry apparel 0.060 Industry banking
0.073 Industry manufacturing 0.033 Industry insurance
0.116 Industry services 0.042 Industry auto repairs
0.022 Industry restaurant 0.036 Industry education
0.041 I ndustry food catering 0.080 Industry construction
0.047 Indu stry transportation 0.047 Industry mortgage
0.076 Industry travel 0.253 Industry legal
0.146 Age35−44 0.086 Age18 0.004 Age18−24
0.017 Age 45−54 0.242 Household size1
0.486 Household size2 0.566 Household size3
0.209 Household size 4 0.016 Household size5
0.169 Household size6
(9.6)
From Eq. (9.6), we conclude the following
●
●
●
Geographically, policyholders in FL, DC, and CA are more likely to
grow their insurance purchase than policyholders in the other states.
Policyholders in apparel, manufacturing, and construction are more
likely to increase their purchases than policyholders working in other
industries.
The 35–44 age group has a higher likelihood of increasing insurance
purchase amount than other age groups.
The probability of growing insurance purchase amount with a policyholder located in CA, working in the manufacturing industry, aged
between 35 and 44, and with a household of four people is
Data Mining for Customer Acquisition, Retention
pgrowth
5%
exp(1.770 2.292 0.073 0.146 0 .209)
0.63
1 exp(1.770 2.292 0.073 0.146 0 .209)
Safe Net can use the formula in Eq. (9.6) to score its existing policyholders and target those with the highest probability of growth with the firm’s
new insurance package. With this approach, in order to insure a maximum level of marketing returns, the firm invests its marketing dollars on
those customers most likely to grow their insurance purchase.
■ Reference
Neter, J., W. Wasserman, and M.H. Kutner. Applied Linear Statistical Models.
Irwin, Homewood, IL, 1990.
231
This page intentionally left blank
CHAPTER 10
Data Mining
for CrossSelling and
Bundled
Marketing
This page intentionally left blank
Cross-selling and bundled marketing both refer to marketing additional
products to existing customers. This chapter presents two case studies that demonstrate the application of data mining to creating effective
cross-selling or bundled marketing strategies. The case studies utilize
association analysis (also known as market basket analysis), a data mining technique that can be implemented with software applications such
as Association Engine, SAS Enterprise Miner, and IBM Intelligent Miner.
Association analysis helps us to understand which products that customers purchase together. The technique addresses the question: if a customer purchases product A, how likely is he to purchase product B? The
question is often expressed in terms of a ‘rule,’ expressed as follows:
A B
As described in Chapter 7, three standard measures are used to assess
the significance of a rule: support, confidence, and lift. Confidence is the
conditional probability that a customer will purchase product B given that
he has purchased product A. Lift measures how many times more likely
is a customer who has purchased product A to purchase product B, when
compared with those who have not purchased product A. Support is the
percentage of total transactions where A and B are purchased together
during a specified period of time.
■ Association engine
Association Engine, a tool developed by Octanti Associates, Inc. (www.
octanti.com) is used to analyze the data. The tool was developed in the
C programming language and has an EXCEL interface.
There are three sections displayed by the EXCEL interface of Associate
Engine. The first section titled ‘Input’ requires the following input fields
from the user.
●
●
●
Minimum product count: Products needs to meet the minimum count to
be included in the analysis. If the minimum product count is 50, then
only those products with at least 50 purchase records are included in
the analysis.
Maximum rule output: The field specifies the number of rules to be displayed in the output section.
Input file: This field identifies the location and the name of the input
file. The input file needs to be in the text format with three variables
(ID, PID, and product). ID, PID, and product refer to customer ID,
product ID, and product name, respectively. The user of Associate
Engine has the option of showing the analysis results with either
product ID or product name.
236
Data Mining and Market Intelligence
●
●
●
Rule sorting: The user can specify how the results are sorted. The
results can be sorted by support, by lift or by confidence.
Rule output: This field gives a user the option of outputting the result
with either product ID or product name.
Engine directory: This is where a user can specify the location of the
Association Engine application.
On the lower left corner of the display screen of Association Engine is
the data summary section. This section shows the basic statistics of the
input data.
●
●
●
●
●
●
●
Original number of records: This field indicates the number of transaction records in the input file.
Number of records after de-dupping: Association Engine performs an
automatic de-dupping of the raw transaction data. If a customer purchases product A three times, there will be three duplicate records
of customer product combinations in the raw data. The de-dupping
process of Association Engine will remove two out of the three records
and create a unique record.
Number of records after filtering: This field shows the number of transaction records after Association Engine removes the products that do
not meet the minimum product count requirement.
Number of customers before filtering: This shows the number of unique
customer ID before Association Engine removes the products that do
not meet the minimum product count requirement.
Number of customers after filtering: This indicates the number of unique
customer ID after removing products that do not meet the minimum
product count requirement.
Number of products before filtering: This field is the number of unique
product ID before removing products that do not meet the minimum
product count requirement.
Number of products after filtering: Number of unique product ID after
removing products that do not meet the minimum product count
requirement.
On the right hand side of the display screen of Associate Engine is the
analysis output section where top key rules and three statistics (support,
lift, and confidence) are displayed.
■ Case study one: e-commerce
cross-sell
E-commerce firm Horizon offers an array of consumer products online.
The company wants to increase the sale of LCD Flat Panel HDTVs given
Data Mining for Cross-Selling and Bundled Marketing
the very attractive margins of these products. The marketing executives
at Horizon believe that it would be less expensive and more profitable to
cross-sell this product to existing customers than it would be to acquire
brand new customers.
Horizon has 2 years of transactional data, which consists of basic customer (customer ID, names, and addresses) and purchase information
(product purchase date, product purchased, units purchased, and purchase dollar amount). A cross-selling analysis is conducted by utilizing
a 5% random sample (69,671 records) from the 2-year transactional data.
Two specific variables, customer ID and product purchased, are used for
the analysis. Table 10-1 illustrates partial raw data used for the analysis.
Table 10-1 Raw data subset used for Horizon cross-selling analysis
Customer ID
Product purchased
10001
10001
10002
10003
10003
10004
10004
10004
10004
HP 710C Inkjet Printer
IBM Thinkpad Laptop
Toshiba laptop
Apollo p1200 Color Inkjet
LCD Flat Panel HDTV
Camcorder with HD Video
LCD Flat Panel HDTV
VCR
Sony VAIO laptop
Model building
Around half of the sample data, or 34,929 transaction records, is used for
building an association model. After de-dupping and removal of products that do not meet the minimum product count requirement (50 in this
case), 27,264 unique transaction records, 18,172 unique customers and 148
products are included in the analysis. Table 10-2 shows the partial results
of the analysis. Association Engine outputs the first 20 rules as instructed.
Consider the second and the fourth rule as examples. The second rule in
the main output shows that those who purchase VCRs often purchase LCD
Flat Panel HDTV as well. The three statistics (support, lift, and confidence)
for the second rule are 0.17%, 92.1, and 43.4%. The support statistic, 0.17%,
is the percentage of customers that purchased both VCRs and LCD Flat
Panel HDTVs. The lift statistic, 92.08, shows that customers who purchased
VCRs were 92.08 times more likely to purchase LCD Flat Panel HDTVs than
randomly selected customers. The confidence statistic is the conditional
237
238
Data Mining and Market Intelligence
probability that indicates that 43.37% of the customers who purchased
VCRs also purchased LCD Flat Panel HDTVs. The fourth rule in the main
output shows that Sony VAIO laptops were often purchased together
with LCD Flat Panel HDTVs by the same customers. The support statistic indicates that 0.12% of the customers purchased both products. Those
who purchased Sony VAIOs were 89.39 times more likely to have purchase
LCD Flat Panel HDTVs. A significant percentage (42.11%) of those customers who purchased SONY VAIOs bought LCD Flat Panel HDTVs.
Table 10-2 Partial association analysis results for case study one (model
building stage)
Model output
Rule
Support
(%)
Lift
Confidence
(%)
StarOffice
5.0 Personal
Edition for
Linux (Intel)
→
Red Hat Linux
6.0 for Intel
Systems
0.53
20.39
49.11
VCR
→
LCD Flat Panel
HDTV
0.17
92.08
43.37
Seagate 6.5
GB EIDE Hard
Drives
→
Hard Drive
Cable Pack
0.11
104.36
43.14
Sony VAIO
laptop
→
LCD Flat Panel
HDTV
0.12
89.39
42.11
Zobmondo!!
Lite boardgame
→
Zobmondo!!
Original
boardgame
0.12
88.31
40.32
Sony VAIO
laptop
→
VCR
0.11
96.75
38.60
LCD Flat Panel
HDTV
→
VCR
0.17
92.08
36.73
KB Gear
JamCam/Web
Page Power
Pack Combo
→
KB Gear
JamCam
0.12
16.93
34.67
Canon digital
camera
→
IBM Thinkpad
0.47
15.63
32.23
HP 710C inkjet
printer
→
Toshiba laptop
0.23
22.10
30.38
(Continued)
Data Mining for Cross-Selling and Bundled Marketing
Table 10-2 (Continued)
Model output
Rule
Support
(%)
Lift
Confidence
(%)
Apollo P1200
color inkjet
→
Toshiba laptop
0.09
21.94
30.16
Camcorder
with HD video
→
IBM Thinkpad
0.28
14.24
29.35
Apollo P1200
color inkjet
→
DISC burner
0.09
54.53
28.57
DVD player
→
IBM Thinkpad
0.24
13.66
28.16
DVD player
→
Canon digital
camera
0.23
18.67
27.01
VCR
→
Sony VAIO
laptop
0.11
96.75
26.51
Zobmondo!!
Original
boardgame
→
Zobmondo!!
Lite
boardgame
0.12
88.31
26.32
Camcorder
with HD video
→
Canon digital
camera
0.25
17.88
25.87
Hard Drive
Cable Pack
→
Seagate 6.5
GB EIDE Hard
Drives
0.11
104.36
25.58
Apollo P1200
color inkjet
→
HP 710C inkjet
printer
0.08
33.44
25.40
Model validation
The remaining data sample (34,742 transactional records) is used for
model validation. After de-dupping and removal of products that do not
meet the minimum product count requirement, 27,143 unique transaction
records, 18,170 unique customers and 150 products are included in the
analysis. Table 10-3 illustrates the first 20 rules created by the Association
Engine. The third and the fifth rule in the results are similar to the second
and the fourth rule in the model building output previously discussed.
The third rule in the main output shows that those who purchased VCRs
were likely to purchase LCD Flat Panel HDTVs. The three statistics (support, lift, and confidence) of this particular rule are 0.17%, 91.8, and
39.33%. The fifth rule in the validation results indicates that those who
purchased Sony VAIO laptops were likely to have purchased LCD Flat
239
240
Data Mining and Market Intelligence
Panel HDTVs. The three statistics (support, lift, and confidence) of the
fifth rule are 0.11%, 81.50, and 34.92%, respectively. The validation results
are consistent with the model results, which confirms that Horizon should
Table 10-3 Validation results for association analysis in case study one
Model output
Rule
Support
(%)
Lift
Confidence
(%)
StarOffice
5.0 Personal
Edition for
Linux (Intel)
→
Red Hat Linux
6.0 for Intel
Systems
0.62
19.59
48.86
LCD Flat Panel
HDTV
→
VCR
0.17
91.78
39.33
VCR
→
LCD Flat Panel
HDTV
0.17
91.78
39.33
Seagate 6.5 GB
EIDE Hard
Drives
→
Hard Drive
Cable Pack
0.11
97.10
38.33
Sony VAIO
laptop
→
LCD Flat Panel
HDTV
0.11
81.50
34.92
Sony VAIO
laptop
→
VCR
0.10
77.79
33.33
KB Gear
JamCam/Web
Page Power
Pack Combo
→
KB Gear
JamCam
0.12
16.03
33.33
Canon digital
camera
→
IBM Thinkpad
0.50
15.30
32.49
Zobmondo!!
Lite
boardgame
→
Zobmondo!!
Original
boardgame
0.09
86.31
31.58
DVD player
→
Canon digital
camera
0.26
20.19
30.81
Camcorder
with HD video
→
IBM Thinkpad
0.33
14.30
30.36
DISC burner
→
Toshiba laptop
0.15
23.20
28.70
Hard Drive
Cable Pack
→
Seagate 6.5 GB
EIDE Hard
Drives
0.11
97.10
28.05
(Continued)
Data Mining for Cross-Selling and Bundled Marketing
Table 10-3 (Continued)
Model output
Rule
Support
(%)
Lift
Confidence
(%)
DVD player
→
IBM Thinkpad
0.23
12.87
27.33
DVD player
→
Camcorder
with HD video
0.22
24.26
26.16
HP 710C inkjet
printer
Camcorder
with HD video
→
Toshiba laptop
0.18
20.34
25.17
→
Canon digital
camera
0.27
16.38
25.00
Red Hat Linux
6.0 for Intel
Systems
→
StarOffice
5.0 Personal
Edition for
Linux (Intel)
0.62
19.59
24.90
LCD Flat Panel
HDTV
Hard Drive
Cable Pack
→
Sony VAIO
laptop
Maxtor 13.6
Gigabyte Hard
Drive
0.11
81.50
24.72
0.10
52.77
24.39
→
cross-sell LCD Flat Panel HDTVs to two customer segments, those who
purchased VCRs and those who purchased Sony VAIO laptops.
■ Case study two: online advertising
promotions
Online advertising is another area where cross-selling analysis can be
applied to create effective targeting strategies. Association analysis can
be used to determine what additional advertisement to be offered to web
visitors after they respond to a particular advertisement. The following
case study illustrates this concept.
Financial retailer Netting, Inc. wants to optimize the performance of its
online banner advertisement, which presents discount offers on various
investment products such as mutual funds, 401 K plans, and retirement
plans. Performance of an online banner ad is defined as the number of
leads generated by the banner in a given period of time. A lead is a website visitor who clicks on a banner ad and fills out a registration form to
download a white paper posted in the ad.
241
242
Data Mining and Market Intelligence
Netting has 1 year of historical data, consisting of basic information
(visitor ID, name, postal address, and e-mail address) on website visitors who responded to a list of ads. Netting draws a 5% random sample
(10,266 records) from the historical data and decided to use two variables,
visitor ID and advertisement responded to, in the association analysis.
Table 10-4 illustrates data subset used for the analysis.
Table 10-4 Data subset used in association analysis
in case study two
Website visitor ID
Banner
11
11
11
13
13
15
15
15
16
Life Insurance
Home Equity
ETF
401K
ETF
ETF
Home Mortgage
Home Equity
Home Mortgage
Model building
Netting utilizes half of the data (5133 lead records) for building an
association model. After de-dupping and removal of ads that do not
meet the minimum product count requirement of ten, 5066 unique lead
records, 2258 unique visitors and 20 ads are included in the analysis. The
Association Engine displays the first ten rules as specified by the user.
Table 10-5 shows the results corresponding to these ten rules. Take the
first and the fifth rule as examples. The first rule indicates that those visitors who responded to mutual funds ads often also responded to ads on
exchange traded funds (ETF). The three statistics (support, lift, and confidence) associated with the first rule are 1.33%, 4.18, and 46.88%, respectively. The support statistic indicates that 1.33% of the visitors in the
model building data set responded to both mutual funds and ETF ads.
The lift value suggests that visitors who responded to mutual funds ads
were 4.18 times more likely to respond to ETF ads than randomly selected
visitors were. The confidence statistic indicates that 46.8% of the visitors
who responded to mutual funds ads also responded to ETF ads. The
fifth rule in the analysis results shows that visitors who respond to home
mortgage ads are likely to respond to home equity ads as well. The three
Data Mining for Cross-Selling and Bundled Marketing
Table 10-5 Association analysis partial results for case study two (model
building stage)
Model output
Rule
Mutual Funds
Stocks
ETFs
Futures
Home Mortgage
Home Mortgage
Fixed Income
Home Mortgage
401 K
Home Mortgage
→
→
→
→
→
→
→
→
→
→
ETFs
Home Equity
401 K
ETFs
Home Equity
401 K
Home Equity
Life Insurance
Home Equity
Options
Support
(%)
Lift
Confidence
(%)
1.33
4.38
1.64
1.55
1.86
1.82
1.99
1.77
7.62
1.73
4.18
1.32
2.16
3.81
1.22
2.02
1.17
2.75
1.14
4.40
46.88
44.59
43.02
42.68
41.18
40.20
39.47
39.22
38.31
38.24
statistics (support, lift, and confidence) associated with the fifth rule are
1.86%, 1.22, and 41.18%, respectively.
Model validation
Netting uses the remaining 5133 records for model validation. After
de-dupping and removal of products that have fewer than 10 transactional records, there remain 4989 unique lead records, 2302 unique visitors, and 20 ads in the analysis. The Association Engine displays the first
ten rules. Table 10-6 shows the model validation results. Six out of the
ten rules are consistent with the rules generated from the model building
process. The six rules are
●
●
●
●
●
●
Futures
Mutual Funds
Home Mortgage
Home Mortgage
Fixed Income
401 K
→
→
→
→
→
→
ETF
ETF
Home Equity
Life Insurance
Home Equity
Home Equity.
Consider the first and the second rule as examples. The first rule
(Futures→ETFs) indicates that web visitors who responded to mutual
funds ads were likely to respond to ETF ads. For this rule, the support,
lift, and confidence statistic are 1.91%, 4.12, and 53.66%. The second
rule (Mutual Funds→ETF) suggests that web visitors who responded to
mutual funds ads were likely to respond to ETF ads as well. In this case,
243
244
Data Mining and Market Intelligence
Table 10-6 Validation results for association analysis in case study two
Model output
Rule
Futures
Mutual Funds
Home Mortgage
Home Mortgage
Fixed Income
Life Insurance
401 K
ETFs
Savings
Credit Card
→
→
→
→
→
→
→
→
→
→
ETFs
ETFs
Home Equity
Life Insurance
Home Equity
Home Equity
Home Equity
Credit Card
Home Equity
Home Equity
Support
(%)
Lift
Confidence
(%)
1.91
1.56
1.87
1.74
2.09
6.34
6.86
1.39
3.34
4.69
4.12
4.00
1.39
2.59
1.20
1.17
1.16
2.90
1.13
1.11
53.66
52.17
45.74
42.55
39.67
38.62
38.35
37.21
37.20
36.61
the support, lift, and confidence are 1.56%, 4, and 52.17%, respectively.
Results indicate that if Netting wants to boost the number of leads from
ETF ads, it should target those who have already responded to futures or
mutual fund ads. To increase the number of its leads from home equity
ads, the firm should present its home equity ads to web visitors who have
already responded to home mortgage, fixed income, or 401 K advertising.
To improve on the number of leads for life insurance, the firm should offer
its life insurance ads to those who have responded to home mortgage ads.
CHAPTER 11
Web
Analytics
This page intentionally left blank
According to JupiterResearch (www.jupiterresearch.com), the US online
advertising market is expected to grow from $9.3 billion in 2004 to reach
$19 billion in 2010. This rapid growth is partially driven by the fact that
with web analytics tools, data on online advertising performance can be
collected, analyzed, and optimized timely and efficiently. The growth in
online advertising is also driven by the rapid development and increasing influence of search marketing, a topic that we will discuss in depth in
Chapter 12.
In this chapter we focus on web analytics and metrics. Web analytics is
often implemented separately from marketing research and data mining.
Synergy among these three areas should be explored to effectively capture a bird’s eye view of the market and its customers.
■ Web analytics overview
Web analytics comprises a variety of techniques for measuring and
analyzing web activity. Such techniques can be implemented with a
variety of tools, such as those supported by Webtrends Analytics
(www.webtrends.com), Omniture SiteCatalyst (www.omniture.com),
Coremetrics (www.coremetrics.com), and Google Analytics (www.google.
com). The following is a list of common applications of web analytics.
●
Measuring web site visitor activities across the five stages of a sales
funnel (awareness, interest and relevance, consideration, purchase,
loyalty, and referral): Various metrics are used to gauge visitor activities at every stage of the funnel. Examples of the metrics are number
of visits, number of clicks, number of page views, navigation paths,
time spent on site, number of downloads, and purchases. These visitor activities can be either organic or triggered by marketing events
such as lead generation and e-commerce. Web visitor activities that
result from a marketing effort can be tracked to measure the performance and impact of the marketing program. In the absence
of any marketing effort, web visitor activities can be analyzed to
better understand the effectiveness of web site architecture and
usability.
Optimization of marketing efforts is done by integrating web analytics into future marketing, campaign planning, and execution on the basis
of web analytics output. Some common software tools to accomplish this
include:
●
●
Ad serving platforms, such as DoubleClick’s DART (www.
doubleclick.com)
Ad content management application
248
Data Mining and Market Intelligence
●
●
●
●
Search marketing bid management software
Site search system
Targeting and segmentation application
E-commerce platform.
As previously mentioned, web analytics has made a significant contribution to the increasing importance of online advertising and marketing.
Timely measurement of online marketing has been greatly facilitated by
web analytics tools. However, given the multiplicity of operational metrics that can be collected, the volume of data generated by typical web
analytical tools can sometimes be overwhelming. To mitigate this daunting task, it is important to identify key operational metrics that are highly
correlated with the success metrics (or key performance indicator). It is
not necessary to track all metrics. Rather, the focus should be on tracking
and optimizing the key operational metrics to maximize marketing success. Although web analytics tools are effective in collecting web data and
generating reports and analysis, human intervention is essential to correctly interpret results and derive actionable recommendations.
■ Web analytic reporting overview
Standard web analytics tools provide reporting capabilities for a variety
of web site visitor activities. These reports show information snapshots,
and display trends over a period of time. It is important to select truly relevant reports based on the objectives of the marketing plan.
The following sections discuss the common objectives of online marketing and the role of web analytics in serving these objectives.
Brand or product awareness generation
The visitor ’s awareness of a particular brand or product can be measured
directly through online surveys supported by numerous software tools.
Most of these tools can be used to create questionnaires, administer and
collect survey data, and perform analysis. Zoomerang (www.zoomerang.
com) is an example of such an online survey tool. Online survey and analysis services are also provided by survey companies, such as SurveySite
(www.surveysite.com). Survey studies, if conducted on a regular basis,
can directly measure changes in web site visitor awareness levels.
In the absence of awareness surveys, proxy metrics may be used to estimate changes in visitor awareness. These proxy metrics track web traffic
volume in one form or another, and are utilized under the assumption
that the more frequently visitors are exposed to web marketing, the more
their awareness increases as a result. Higher traffic volume to the web site
Web Analytics
indicates a higher exposure to product or brand information. Common
proxy metrics are:
●
●
●
●
●
●
●
Number of visits
Number of unique visitors
Number of new visitors
Number of returning (repeat) visitors
Number of page views
Average page views per visitor
Average time spent on the site.
Consider the following operational metrics that may influence changes
in visitor awareness. These metrics are mainly content-related.
●
●
●
●
●
Entry page
Referral URL
Exit page
Most requested pages
Search keyword.
Some of these metrics, such as search keywords, addresses the question
of why a visitor visits the site. As a result, better-targeted ads and messages can be created with clear understanding of the reason why visitors
visit a web site. Some web analytics tools offer visitor segmentation capability for analyzing data on visitor behavior, reasons for visiting a web site,
and visitor demographics.
With the increasing popularity of social networking communities such
as blogs, some web analytics tools now offer additional analytic capabilities to analyze data collected about community members. For instance,
Omniture’s SiteCatalyst 13 (www.omniture.com) provides reporting and
analysis capabilities for measuring consumption and influence of social
networking and blog content.
Web site content management
Online publishers such as CNET.com and media site owners such as
Yahoo.com generate revenues partially through reader subscriptions or
sales of online real estate (web site space) to advertisers. The following is
a list of key factors that online publishers or media site owners may consider for pricing.
●
●
Number of impressions: An impression is an incidence where a web
site visitor views particular web content.
Net reach: This statistic measures the percentage of the target audience reached by a particular advertisement in each exposure.
249
250
Data Mining and Market Intelligence
●
●
●
●
●
●
●
Frequency: Frequency refers to how many times a particular advertisement is exposed to the target audience over the marketing horizon.
Gross rating points (GRP): This statistic measures the percentage of
the target audience reached by a particular advertisement. When an
advertisement is exposed to its target audience only once, GRP equals
net reach. When there are multiple exposures, GRP equals the product
of net reach and frequency.
Number of clicks on an ad over a period of time.
Click-through rate (also known as CTR) on an ad, defined as number
of clicks divided by number of visitors over a period of time.
Total number of page views over a given period of time.
Average number of page views per individual over a given period of
time.
Average time spent in viewing an ad per individual over a given
period of time.
Just as visitor traffic is essential to measure visitor awareness, its significance in driving online publishing and media advertising revenues cannot
be overemphasized. The reason is that data on visits can provide insight on
what categories of topics and content visitors are most interested in. Visitor
segmentation can be applied to the design of targeted ads. Publishers and
media site owners can impose a higher CPM (cost per thousand impressions) for site sections with more web traffic and more targeted audiences.
Content can be managed more effectively by analysis of metrics such as:
●
●
●
●
●
●
●
●
●
Entry page
Referral URL
Exit page
Most requested pages
Most viewed topics, categories, or subcategories
Most e-mailed or forwarded articles
Article ratings
Downloads
Search keyword.
Lead generation
A lead is defined as an individual or a business that has a need for a product or service, and is serious about making a purchase. The purpose of a
lead generation effort is to uncover such individuals or businesses and to
convert them into buyers.
Number of leads, lead quality, and lead conversion rate are metrics
commonly used for measuring the performance of a lead generation
effort. The definition of a lead varies from company to company, and
Web Analytics
most companies assign different grades to leads. A ‘grade A’ lead, for
example, usually has a shorter purchase time frame, a higher purchase
budget, or a better credit rating than a ‘grade B’ lead.
The following operational web metrics can provide insight on what
may cause changes in lead volume and lead quality. Some of the metrics
offer information on the types of products or services that a visitor looks
for, and can be leveraged to enhance the quantity and quality of leads.
●
●
●
●
●
●
●
●
Entry page and referral URL: These two metrics measure sources of
leads and how they impact lead quantity and quality. Sources such as
search engines, online or print ads, partners, and affiliates that bring
in more qualified leads are the sources that need to be emphasized
through adequate investment.
Exit page: This metric provides information on where visitors exit a
web site. For qualified leads, their exit page is usually the lead form.
Pages viewed prior to the lead form page are likely to have provided
the visitor with the most appropriate information. For those who
abandon the web site without becoming leads, their exit pages can
provide insight as to why they drop off the process and what can be
improved in terms of web page content.
Most popular (most frequently traveled) navigation paths (also known
as click streams): Analysis of navigation paths can provide insight on
the visitor ’s thought process and lead to construction of a visitor profile (also know as visitor persona). A visitor who visits a travel web
site and looks for a family vacation package in the summer is likely to
have kids in school, and therefore is likely to look for destinations that
suit both adults and children during summer breaks.
Most requested pages: These pages provide contents that are in highest demand.
Keywords searched by qualified leads indicate the types of products
and services that these leads look for. Keywords that bring in high volume of qualified leads should be emphasized with more investments
in bidding for and purchase of these keywords.
Geography: Most web analytics tools provide geographic information on where visitors are physically located. Insight from geography
allows for the design of more localized and better-targeted content.
Lead demographic and attitudinal data: Demographic and attitudinal
data collected through online registrations or lead forms provide additional data for lead profiling and segmentation analysis.
Lead to click ratio: This ratio is an indication of click quality. Clicks
that do not convert to leads come from individuals who browse the
web site without serious intent of making a purchase.
Some web analytics tools are capable of tagging individual fields in a
lead form. By tracking where visitors drop off as they fill out a lead form,
251
252
Data Mining and Market Intelligence
lead form designers can optimize the forms to minimize abandonment
rate.
It may become a challenge to measure lead conversion if leads are passed
onto a sales force for offline followup and purchases. Some web analytics
tool vendors have developed an automatic tracking process to tie online
leads with offline purchases. One example of this type of automation process is the collaboration between WebSideStory and salesforce.com.
E-commerce direct sales
Web analytics are excellent tools for tracking direct online purchases.
By monitoring how visitors complete or drop off from the purchase process, web sites can be optimized to increase online revenues.
Most of the key metrics tracking e-commerce performance can be
tracked directly or derived by web analytics tools. Here are just a few of
these metrics:
●
●
●
●
●
●
●
●
●
Number of transactions
Number of buyers
Number of transactions per buyer
Total sales
Average sales per transaction
Average sales per buyer
Total profit margin
Average profit margin per buyer
Average profit margin per transaction.
Among operational metrics that can provide insight on what may drive
online e-commerce revenues are:
●
●
Entry page and referral URL: These two metrics track the sources of
buyers, and how these sources impact revenues. Sources that bring in
more buyers are the sources that need to be supported with additional
investment. Consider an example where the majority of online buyers of an e-commerce web site have visited a partner web site before
entering into the e-commerce site. The e-commerce web site owner
may consider increasing his investment in this particular partner site
or similar web sites.
Exit page: As potential buyers arrive at the home page of an ecommerce site, they usually start with the main page of a product category and then venture into subcategory pages, place products in shopping carts, and if they proceed with the purchase they check out. Visitor
abandonment rate can be minimized by tracking where potential
buyers drop off in the navigation process and by focusing on areas of
the web site that require improvement.
Web Analytics
●
●
Exit field: Some web analytics tools are capable of tracking visitor behavior as the visitor navigates and fills out online order forms.
Some potential buyers drop out of the purchase process because they
are asked to fill out too much information on the order form. Web analytics tools enables tailoring of online order forms to increase the completion rate.
Search keyword: By quantifying marketing returns by keyword, management of keyword price bidding can be made more cost effectively.
For example, a keyword that brings in 1000 clicks, 100 buyers, $5000
in revenue, and $1500 in profits, results in a profit of $1.50 per click.
This implies that the bidding price, measured in cost per click (CPC),
needs to be kept below $1.50 to make the investment cost-effective.
Customer support and service
Increasingly, companies are incorporating customer support and service
functions into their web sites to reduce customer service cost and to better serve their customers. According to industry analysts, the cost per live
service phone call is ten times the cost per web self-service (Customer
Service Group 2005).
Cost per case closed is an example of the key metrics that measure the
performance of online customer service. Additional metrics include:
●
●
●
Number of customers served in a given period of time
Number of customer issue resolved or case closed in a given period of
time
Improvement in customer satisfaction.
Web syndicated research
Just as syndicated research is widely available in the offline world, similar
research is also available for measuring online visitor behavior. Nielsen//
NetRatings (www.nielsen-netratings.com), ComScore (www.comscore.
com), Hitwise (www.hitwise.com), Dynamic Logic (www.dynamiclogic.
com), and other firms offer both syndicated research and customized
research on online behavior, industry analysis, competitive intelligence,
or market forecast. Syndicated research companies collect, report, and
analyze a large number and a broad range of web sites to compile information about the online market. Online syndicated research can be classified into five categories wherein each category data may be dissected by
type of web sites, industries, and individual web sites.
●
Internet audience measurement: Syndicated research focusing on this
category measures standard web traffic and visitor metrics. Nielsen
253
254
Data Mining and Market Intelligence
●
●
●
●
NetRatings’ NetView (www.nielsen-netratings.com) and comScore’s
Media Metrix (www.comscore.com) offer this type of reporting and
analysis through their subscription services.
Intelligence on online advertising sales networks: This type of
research focuses on tracking web traffic on advertising sales networks. Advertising.com (www.advertising.com) is an example of such
advertising sales networks. comScore (www.comscore.com) provides
this type of specialized syndicated research through its Media Metrix
service.
Profiling and targeting of online audience: Most major syndicated
research providers offer target audience segmentation to make online
advertising more effective. Take Nielsen//NetRatings’ MegaPanel as
an example (www.nielsen-netratings.com). This product combines
Internet behavior with attitudinal, lifestyle, and product usage data
from its panel members to infer the highest level of insight on visitor
behavior.
Industry and competitive intelligence: This kind of research enables
comparison of the performance of one site against the performance
of competing sites, and provides industry benchmark data. Hitwise’s
Industry Statistics (www.hitwise.com) offers both Internet behavior
and demographic profiles for over 160 industries.
Performance of online advertising campaigns: Dynamic Logic’s
MarketNorms reports (www.dynamiclogic.com) are based on data
collected over a large number of online campaigns.
■ References
Web Self-service improves support, cuts costs. Customer Service Group Newsletter, &
New York (2005).
CHAPTER 12
Search
Marketing
Analytics
This page intentionally left blank
Search marketing, a specialized area of Internet marketing, has become
a key marketing vehicle over the past years due to its cost effectiveness
in generating leads and revenue. Appropriate web analytics tools need
to be deployed to track the performance of a search marketing program.
Analytic output from such tools is then used to optimize future search
strategies. Without proper tracking and analysis, a search marketing program is likely to be a costly and wasteful effort.
In Chapter 3, we briefly introduced search marketing as a marketing
communication channel. This chapter provides a more in-depth discussion specifically on three types of search, organic search, paid search, and
onsite search. The chapter also directs the reader to numerous websites
offering free search marketing analytic tools and tips.
■ Search engine optimization
overview
The interface of a search engine is a web page where a user can enter a
keyword to observe relevant results. The results are displayed as listings
of website links on a search engine result page (SERP). There are two
types of listings, free listings and paid listings (Figure 12-1). Free listings
Figure 12-1
Listings and ad copies in a result page.
258
Data Mining and Market Intelligence
are listings of website links that are displayed free of charge to website
owners. Paid listings are listings of website links shown at a cost to website owners. Paid listings sometimes are referred to as sponsored listings
and the web links they contain are sponsored links. An organic search, or
a natural search, is a process by which a user enters a keyword and find
relevant free listings. A paid search is a process where a user enters a keyword and find relevant sponsored listings.
Search engine optimization (SEO) is a discipline that has been developed to help website owners optimize their websites in a manner that
their website links appear in free listings visibly displayed in SERPs. An
ad copy is a brief description of the content of a website. Very often each
website link on a result page is immediately followed by an ad copy as
illustrated in Figure 12-1.
Search engines such as Google and Yahoo use proprietary algorithms
that apply different weights to a spectrum of factors in order to evaluate
the relevance of a website to a particular search keyword. These factors
include frequency of keywords appearing on a website (keyword density)
and link popularity, determined by number of inbound links (backlinks)
to a website. Forward links (outbound links) are the opposite of backlinks.
Forward links refer to links directing visitors out of a website. Google
uses a proprietary algorithm named PageRank (Page, Brin, Motwani, and
Winograd 1998) that generates a numeric value to measure the importance
of a site. The algorithm by now has over 100 variables (Davis 2006). The
higher the PageRank output value of a website is, the more likely the
website is to appear in a prominent (or high ranking) position in Google
search result pages. Search engines index billions of websites and display
those that are most relevant in response to a keyword search. To create
keyword search results, search engines run automatic programs called
robots (also known as spiders or crawlers) on their indices to identify
contents most relevant to keywords.
SEO requires specialized technical knowledge and experience in creating content relevant to keywords. Although search engine algorithms are
proprietary and well-guarded trade secrets, there are general rules disseminated by search engine firms and search marketing experts on how
to make website content more relevant to search keywords.
Optimization following these generally accepted rules without compromising the quality of the website message and content is referred to
as white hat optimization. Black hat optimization is the opposite of white
hat optimization. Black hat optimization exploits potential weaknesses of
the generally accepted rules to achieve high rankings. Tactics such as creating numerous irrelevant backlinks to a website is black hat optimization. Website owners that practice black hat optimization run the risk of
having their websites or pages de-listed by search engines. Given these
facts, caution needs to be exercised when practicing SEO.
Search Marketing Analytics
Website owners can submit their websites’ URLs or web pages
directly to search engines submission pages such as the following (www.
searchengine.com).
●
●
●
http:www.google.com/addurl.html
submit.search.yahoo.com/free/request
search.msn.com/docs/submit.aspx
An alternative way for a website to get into search engines’ listings
is for a website owner to submit a website to a directory (Davis 2006).
A directory organizes websites by placing them in different categories
and subcategories. One of the most important directories is the Open
Directory Project, or ODP (http:www.dmoz.com).
To implement SEO, an organization can choose to invest in developing in-house expertise or outsource the optimization task to an outside
consultant or consulting firm. In either scenario, the first step in the SEO
process is to identify the website objectives. The objectives of a website
may be to increase visitor traffic or to generate e-commerce sales. The
second step in SEO is to conduct a site analysis to fully understand the
strengths and weaknesses of the website. The final step in SEO is to identify the gap between the website’s current capability and its objectives.
There are numerous online tools that allow for free site analysis for SEO.
The following are some examples.
●
●
●
●
●
●
●
●
●
SEO Chat (http://www.seochat.com/seo-tools) offers many free SEO
tools for quick diagnoses of websites and keywords.
SEO Today (http://www.seotoday.com) is a website that offers a lot of
SEO related information and resources.
Search Engine Marketing Professional Organization, or SEMPO (http://
www.sempo.org)
SEO-PR (http://www.seo-pr.com)
Search Engine Watch (http://www.searchenginewatch.com)
SES/Search Engine Strategy (http://www.searchenginestrategies.com)
MarketingSherpa (http://www.marketingsherpa.com)
dwoz.com (http://www.dwoz.com/default.asp?Pr=123)
webuildpages.com (http://www.webuildpages.com/tools/default.htm)
Site analysis
The objective of site analysis is to measure the website features and properties that are of importance to search engines, such as keyword density
and link popularity. Keyword (search term) tracking and research tools
provide information on keyword search frequency, keyword suggestions and keyword density. Keyword search frequency is the number of
searches for a particular keyword for a given period of time. Some online
259
260
Data Mining and Market Intelligence
site analysis tools generate keyword suggestions based on factors such
as keyword popularity. The following links, grouped by topic, contain
information useful to website owners looking to optimize the contents of
their sites.
●
●
●
●
●
●
Keyword popularity and frequency
– http://www.wordtracker.com
Keyword suggestions
– http://www.adwords.google.com/select/KeywordToolExternal
– http://tools.seobook.com/keyword-tools/seobook
Graphically display keyword density on websites
– http://www.seochat.com/seo-tools/keyword-cloud
Keyword density on a web page
– http://www.seochat.com/seo-tools/keyword-density
Keywords trends and top keywords
– http://www.about.ask.com/en/docs/iq/iq.shtml
– http://www.dogpile.com/info.dogpl/searchspy
– http://www.google.com/press/zeitgeist.html
– http://www.50.lycos.com
– buzz.yahoo.com
– http://www.google.com/trends
Competition finder for measuring the number of pages containing
selected keywords
– http:www.tools.seobook.com/competition-finder/index.php
Domain analysis tools provide information about a particular domain
name such as its availability, its host name, and its administrative contact.
The following are some useful online domain analysis tools.
●
●
●
Tracking availability and status of a particular domain name
– http://www.whois.com
Tracking property of a particular domain name
– http://www.DNSstuff.com
Tracking code to text ratio: The code to text ratio is the percentage of
text in a web page. The higher the code to text ratio is, the more likely
the page is to acquire higher rankings with search engines listings.
– http://www.seochat.com/seo-tools/code-to-text-ratio
Link popularity analysis tools measure the popularity of a website in
terms of the number of links pointing to it (backlinks). The more backlinks a website has, the more popular it is and the more likely it is to
achieve higher rankings with key search engine listings. Websites such as
those listed below provide information on link research and tools.
●
Evaluating website popularity in terms of number of backlinks with
key search engines
– http://www.seochat.com/seo-tools/link-popularity
Search Marketing Analytics
–
●
●
●
●
●
http://www.seochat.com/seo-tools/multiple-datacenter-linkpopularity
Reporting backlinks of a website by search engine
– http://www.thelinkpop.com
Reporting forward links or links inside a site (internal links)
– http://www.seochat.com/seo-tools/site-link-analyzer
Tracking Google PageRank of a website. The higher the PageRank of a
website is, the more popular a website
– http://www.seochat.com/seo-tools/pagerank-lookup
– http://www.seochat.com/seo-tools/pagerank-search
Checking broken redirect links of a URL
– www.seochat.com/seo-tools/redirect-check
URL rewriting (conversion of a dynamic URL to a static URL): Static
URLs are more likely than dynamic URLs to acquire higher rankings
in search engine results. The content of a static URL changes only if
its HTML code is modified. The content of a dynamic URL changes
when there is a change in the queries that generates the content of the
web page. The following website offers a free tool for converting a
dynamic URL to a static URL
– http://www.seochat.com/seo-tools/url-rewriting
A meta tag is an element in the head section of a web page HTML
code that describes the content of the page. Meta tags are invisible to
web page viewers but visible to search engines. Although search engines
used to rely heavily on meta tags for determining the themes of web
pages, the importance of meta tags has decreased lately (Davis 2006).
However, it is still a good practice to make sure that meta tags accurately
reflect the contents of their corresponding web pages. Free online tools
such as those posted on the following websites are available for analysis
of meta tags.
●
●
Displaying meta tags of a web page
– http://www.seochat.com/seo-tools/meta-analyzer
Automatic meta tag generator
– tools.seobook.com/meta-medic
– http://www.seochat.com/seo-tools/meta-tag-generator
– http://www.seochat.com/seo-tools/advanced-meta-tag-generator
A web page title tag describes the name of a web page and resides in
the head section of a web page HTML code. The name of a web page,
or a web page title, appears at the very top of a web page when a web
browser is activated. When a user bookmarks a web page, the web page
title appears as the bookmark name. Very often a web page title, instead
of the actual website URL, is listed in SERPs. Web page title tags help
search engines discern the content of the corresponding web page.
261
262
Data Mining and Market Intelligence
Alt text, which stands for alternate text, is text that is displayed when
an image it describes cannot be shown due to technical reasons. Like
meta tags, alt text is usually not visible to viewers but is visible to search
engine crawlers. Search engines utilize alt text to extract the theme of a
web image. Embedding keywords in alt text is a frequently used strategy
for improving website PageRanks.
SEO metrics
SEO return metrics are determined by the business objectives of a website.
In Chapter 3, we briefly discussed how return metrics vary from stage to
stage of a sales funnel. The same principle applies to SEO metrics.
●
●
●
●
●
In the awareness stage, the main objective of SEO is to insure a website
acquires high listing rankings with search engines in order to gain as
many impressions as possible. Some search engines can provide impression data if requested. The number of impressions, an approximation to
the number of search engine users who enter a particular keyword and
observe a website link in a SERP, is a common return metric at this stage.
In the interest and relevancy stage, search engine users who view and
click on a website listing become visitors to the website. In this stage,
the key return metrics are the number of clicks, the number of visitors
to a website, and the number of responders to marketing offers such
as joining a membership program on a website.
In the consideration stage, visitors become leads by responding to
marketing offers such as filling out a survey form. The number of
leads generated by a marketing offer is a common return metric in this
stage.
In the purchase stage, website visitors who purchase products online
are buyers. Widely adopted return metrics in this stage are the number
of buyers, the number of transactions, and the amount of transactions
in a period of time.
In the loyalty and referral stage, frequently used return metrics are the
number of repeat purchases, the number of referrals, and customer satisfaction scores defined and measured by individual website owners.
A variety of operational metrics can be tracked and improved to
enhance the performance of a website and impact the success of a marketing effort. Under the context of SEO, operational metrics are often
measures of website visitor quantity or quality. The following are some
relevant operational metrics.
●
Keyword position (ranking): Keyword position (ranking) is an operational
metric for measuring the visibility of a website. High visibility drives
high volume of visitors to a website.
Search Marketing Analytics
●
Number of backlinks: This metric influences the rankings of a website
with search engines and consequently impacts the volume of visitors
to a website.
The quality of website content and the effectiveness of ad copies are
important qualitative factors in the success of a SEO effort. Website quality has an impact on the quantity and quality of visitors, attracting repeat
and high-quality users. Ad copies, on the other hand, influence the user ’s
decision to click on a particular link in a listing.
■ Search engine marketing
overview
Search engine marketing (SEM) refers to maximizing returns on investment in paid search keywords. According to comScore (http://www.comScore.com), the US search market in April 2006 was dominated by Google
(43.1%), Yahoo (28.0%), MSN (12.9%), Time Warner/AOL (6.9%), Ask
(5.8%), and myspace.com (0.6%).
The following are some of the major paid search programs currently in
the United States
●
●
●
●
Google AdWords (adwords.google.com/select/Login)
Yahoo Search Marketing (searchmarketing.yahoo.com)
Microsoft adCenter (msftadcenter.com)
Ask (sponsoredlistings.ask.com)
To engage in paid search marketing, a website owner needs to enroll
in the paid search program of a search engine firm and agree to pay a fee
for each click on his website link. This fee is called Pay Per Click (PPC) or
Cost Per Click (CPC). PPC is a function of keyword popularity and keyword position (ranking). The more popular a keyword is or the more visibly a website link appears on keyword search result pages, the higher a
keyword PPC is. PPC is subjected to bidding and as a result changes frequently. Special software has been developed to automate management
of keyword bidding by paid search program participants. Automatic
bid management systems allow for management and tracking of keyword bidding by paid search program participants across multiple search
engine platforms. Vendors such as Google, Yahoo, and Omniture provide
bid management systems.
SEM resources
Keyword positions are important since listings with top positions (or high
rankings) are more visible to users and therefore attract more impressions
263
264
Data Mining and Market Intelligence
(visitor eyeballs). The following websites offer SEO keyword evaluation
tools that can also be applied to SEM.
●
●
●
●
●
●
●
●
●
dwoz.com (http://www.dwoz.com/default.asp?Pr=123)
http://www.webuildpages.com
Measuring keyword popularity and frequency
– http://www.wordtracker.com
Keyword suggestions based on search frequency
– adwords.google.com/select/KeywordToolExternal
– tools.seobook.com/general/keyword
Graphically display keyword density on web-sites
– http://www.seochat.com/seo-tools/keyword-cloud
Measuring keyword density on a web page
– http://www.seochat.com/seo-tools/keyword-density
Keyword position tools for top keywords or keywords of interest
– about.ask.com/en/docs/iq/iq.shtml
– http://www.dogpile.com/info.dogpl/searchspy
– http://www.google.com/press/zeitgeist.html
– 50.lycos.com
– buzz.yahoo.com
Keyword popularity
– google.com/trends
Competition finder measuring number of pages containing chosen
keywords
– tools.seobook.com/competition-finder/index.php
Websites offering general information on SEM include
●
●
●
●
Search Engine Watch (http://www.searchenginewatch.com)
Search Engine Strategy or SES (http://www.searchenginestrategies.com)
SEMPO: Search Engine Marketing Professional Organization(http://
www.sempo.org/home)
MarketingSherpa (http://www.marketingsherpa.com)
SEM metrics
Unlike SEO where most costs are fixed, SEM costs are mainly variable
costs originating in keyword purchases and management. PPC is an
example of SEM metrics.
Analysis of metrics at the keyword level, rather than analysis based
on aggregate data across keywords, is crucial for measuring the cost efficiency of paid search marketing. SEM is a discipline that requires specialized technical knowledge and experience in keyword bidding and
Search Marketing Analytics
management. Without proper keyword bidding and management strategies, SEM can become costly and ineffective.
Like SEO, SEM return metrics differ according to the business objectives and by where the audience is in the sales funnel. The same success
metrics for SEO can be applied to SEM.
Most paid search programs such as Google AdWords provide campaign management reports that allow paid search program participants
to track operational metrics such as keyword position and clickthrough
rate, where the latter is defined as the ratio of the number of clicks and
the number of impressions.
■ Onsite search overview
In contrast to SEO and SEM that focus on search for information residing across multiple websites over the Internet, onsite search is search for
information contained within one website. Onsite search has not gained
as much traction as SEO or SEM (Inan 2006), a reality that is quite unfortunate given the importance of onsite search. More resources appear to
have been devoted to SEO and SEM than to onsite search. As a result,
onsite search is rarely effective. It is ironic that significant investments are
made in SEO or SEM to attract as many visitors as possible to a website
but when they arrive, they may find the site navigation and onsite search
frustrating and as a result they exit the website as quickly as possible.
Onsite search is particularly important to informational sites such as
CNN.com and for e-commerce sites such as Amazon.com and eBay.com.
To infer the degree of importance of onsite search to a particular website,
we can track metrics such as the percentage of visitors who utilize onsite
search or who make decisions on the basis of onsite search. Research
shows that the percent of utilization of onsite search (utilization rate of
onsite search) ranges from 5 to 40% of the number of visitors (Inan 2006).
A low utilization rate of onsite search may indicate either that onsite
search is not essential to website visitors because the website content is
well organized, or that onsite search is not easy to use due to various
factors. If needed, website owners may consider conducting a visitor survey to understand the reason for low utilization in onsite search.
Google (http://www.google.com), Atomz (http://www.atomz.com),
and Mondosoft (http://www.mondosoft.com) are among the application
service providers (ASP) offering onsite search services.
Visitor segmentation and visit scenario analysis
To effectively assist website visitors with their onsite navigation experience, website owners need to understand what audience segments visit
265
266
Data Mining and Market Intelligence
their websites and what their profiles are. Based on visitor segmentation
and profile information, site owners can successfully serve their visitors
by generating relevant content and effectively organizing its message.
Analysis of onsite search keywords can generate insight on what content might be missing on a website or might be hard to find by website
visitors. In other words, an onsite search function can be either an enabling tool for website visitors to find what they need or a research tool for
a website owner to understand their website visitors’ needs.
It is challenging to meet every single visitor ’s needs, so site owners
need to focus on their core audiences. As a result, many website owners pre-segment their visitors to ensure that their core audiences are well
served by their sites. Most site owners appear to be aware of the need for
segmentation and have incorporated basic segmentation into their web
designs. The four common segmentation approaches are by demographics, products, verticals (industries), and geography. Many websites are
structured based on one or more of these four approaches.
Let’s consider a few high-tech company websites and their visitor segmentation schemes. The networking giant Cisco has a very user-friendly
website (http://www.cisco.com/). The four basic visitor segments are
small and medium businesses, large enterprises, service providers, and
homes and home offices. The site also has a products and services section to help visitors locate their products and services of interest. The
mobile phone company Motorola has five visitor segments classified on
its animated website (http://www.motorola.com): consumers, businesses,
governments, service providers, and developers. The company’s website
also has a products and services section with detailed information about
the company’s key products and services by geography. The website of
Hewlett Packard (http://www.hp.com) displays five visitor sections catering to its five target customer segments: homes and home offices, small
and medium businesses, large enterprise businesses, governments, health
and education, and graphic art. These customer segments are fairly common in the high-tech industries due to the fact that technology solutions
need to be customized to address the infrastructure and needs of businesses and consumers.
Financial firms often organize their website contents by products and
reasons for visits. The reasons for visit are mainly aligned with life stage
events. The Bank of America website (http://www.boa.com) exhibits segmentations by demographics, products and services, and reasons for visit.
Three basic demographic segments are visible on the BOA site: personal
(consumers), small businesses, and corporations and institutions. The website also contains information about the bank’s core products and services.
In addition, the website consists of contents organized by life stage
events such as buying a home, purchasing a car, and planning for college.
Information on life events often resonates well with consumer visitors
Search Marketing Analytics
as it speaks to their personal life experiences that are tied with financial
needs. In addition, the site offers a visible section for existing customers
to manage their accounts.
Let’s now consider a website example in the manufacturing industry.
The Toyota (http://www.toyota.com) website lists Toyota’s various car
types (cars, trucks, SUVs/vans, and hybrids) as well as links to sections
organized by reasons for visiting the site, under a header of ‘Shopping
Tools.’
●
●
●
●
●
●
●
●
Build and Price Your Toyotas
Comparisons by Edmunds
Find Local Specials
Explore Financial Tools
Request a Quote
Locate a Dealer
Request a Brochure
Certified Used Vehicles.
Creating distinct sections in a website to cater to various visitor segments may enhance but does not guarantee satisfactory visitor experience. Consider the following scenario. A visitor visits a high-tech website
and searches for an answer to a question regarding the expansion of his
software license agreement. This visitor is a small business owner in the
retail industry residing in the US. He can easily drill down to the specific
demographics (small business), geography (US), product (software), and
vertical (retail) sections of the website but may still get overwhelmed by
the amount of information made available by the search. The visitor may
need to resort to onsite search to find the answer to his question. In this
instance, an onsite search function is ideal in guiding the visitor to the
information he needs.
Success metrics of onsite search are determined by marketing objectives and are similar to those of SEO and SEM. Keyword-level analysis
is crucial for aligning an onsite search function with website content. It
is a misconception that the more information there is on a site, the better
the visitor experience is. Frequently, sites are cluttered with information,
some of which is completely irrelevant to visitors. Through manual
web log file analysis or implementation of advanced web analytics,
website owners can extract the most popular keywords measured by search
frequency and use this information as the basis for enhancing or eliminating particular components of the website contents. In addition to keyword search frequency, here are other metrics can be used to optimize
onsite search.
●
Number of searches per visitor per session: This metric measures the number
of searches a visitor conducts while visiting a particular website.
267
268
Data Mining and Market Intelligence
●
●
●
Number of attempts per search: This metric measures how many times a
visitor enters similar phrases or keywords in his attempts to find relevant content.
Number of attempts before clicking on search results: This metric measures the number of attempts a visitor makes before clicking on any of
the links in the search results. A lower number indicates higher site
efficiency.
Customer satisfaction measured in a quantitative scale such as a scale
of five with five being extremely satisfied and one being extremely
dissatisfied.
■ References
Davis, H. Google Advertising Tools. O’Reilly Media, Inc., 2006.
George, D. The ABC of SEO, Search Engine Optimization Strategies. Lulu Press, 2005.
Inan, H. Search Analytics: A Guide to Analyzing and Optimizing Website Search
Engines. Hurol Inan, 2006.
Page, L., S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking,
Bringing Order to the Web. Stanford Digital Library Technology Projects. Stanford
University, 1998.
Index
This page intentionally left blank
A
Acceptance area, of probability
density curve, 130
Active listening, 12
Ad copy, 258
Addressable market, 76
Advertising
determining financial returns of, 42
impact on generating marketing
returns, 28
online. See Online advertising
Advertorial, 52
AdWords, of Google, 265
AID. See Automatic Interaction
Detection (AID)
Akaike’s Information Criterion (AIC)
statistic, 196
Alternative hypotheses, 130
Alt text, 262
American Express credit card. See
Credit cards
Analysis of dependence, 145
Analysis of interdependence, 145
Analysis of Variance (ANOVA),
172–175
ANOVA. See Analysis of Variance
(ANOVA)
Applet, 52
Applications, of data mining
for cross-selling, 140
for customer acquisition, 140
for customer attrition, 140
for customer profiling, 140
for customer segmentation, 139
for fraud detection, 140–141
for inventory optimization, 140
for marketing program performance
forecasts, 140
for personalized messages, 140
Applied Correspondent Analysis, 164, 168
Arbitron, 108
ARIMA, in time series analysis, 183,
184
Arithmetic mean, 120
Association analysis, 190
defined, 235
for e-commerce, 237–241
for online advertising, 241–244
Association Engine. See also
Association analysis
analysis output section, 235–236
data summary section, 236
EXCEL interface of, 235–236
input section of, 235–236
Attrition model, for customer
retention, 227–229
Audience segmentation. See Customer
segmentation
Autocorrelation function (ACF), in
time series analysis, 182–183,
185
Automatic Interaction Detection
(AID), 157
Autoregressive (AR) model, in time
series analysis, 180–181
Auto-regressive moving average
(ARMA) models, 23
B
Bank of America, 267
credit card. See Credit cards
Banners. See Online banner
advertisement
Behavior and demographics
segmentation
cluster analysis for, 206
model building, 196–201
model validation, 201–205
Binary variable, 117
Binomial distribution, 123
Black hat optimization, 258
Blogs, 60
Brand awareness
online survey for, 248
proxy metrics for, 248–249
Brand equity
definition of, 8
value, 57
Brand recognition, 94–95
Broadcast channels, 51
Budget allocation, fractional, 31
Bundled marketing, 235. See also
Cross-selling strategies
Bureau of Economic Analysis, 77
272
Index
Business objectives
identification of, 45–46, 102
in research report, 112
Business reply, card and envelop, 55,
106
C
Call centers, phone interview in, 107
Canonical correlation analysis,
175–176
CART. See Classification and
Regression Tree (CART)
Case studies
on behavior demographics
segmentation. See Behavior
demographics segmentation
on cross-sell. See Cross-selling
strategies
on customer acquisition, 221–227
on customer growth model,
229–231
on customer retention, 227–229
on customer satisfaction
segmentation. See Customer
satisfaction segmentation
on response sgmentation, 208–210
on value segmentation, 205–208
CATI. See Computer assisted
telephone interviewing (CATI)
Central limit theorem, 118
CHAID. See Chi-Square Automatic
Interaction Detection (CHAID)
Channel partners, 92
Chi-Square Automatic Interaction
Detection (CHAID), 158–160,
168, 195
for response segmentation
model building, 209–210
model validation, 210–211
Chi-square ( 2) distribution, 124–125
Classification and Regression Tree
(CART), 70, 160–162, 195
for customer segmentation, 90
for value segmentation
model building, 207–208
model validation, 208
Click-through rate (CTR), 67, 250
Cluster analysis, 195
for behavior and demographics
segmentation, 206
model building, 196–201
model validation, 201–205
description, 151
hierarchical methods in
agglomerative, 152–157
divisive methods, 157–162
partitioning methods, 162–163
similarity measures, 152
Collaborative filtering, 190–192
Communication channels, 49–51
determination of, 14
multiple, 30
reporting campaign performance
on, 65–67
types of, 12–13
Competitive intelligence, analysis
of, 8
Compound annual growth rate
(CAGR), 80
Computer assisted telephone
interviewing (CATI), 107
comScore report, 75–76
Conjoint analysis, 186, 187–188
Constant elasticity model, 24–25
Contingency tables, 128
Continuous variable, 117
Control group, defined, 134
Corporate finance, and marketing
models, 35–37
Corporate investors, 36
Corporate profits, 79
Corporate websites, 54–55
Correlation coefficient, between two
variables, 126
Correspondence analysis, 164,
168–172
Covariance, 126
Cramer’s coefficient, 129
Credit cards, 80
Cross-selling, 140
Cross-selling strategies
defined, 235
in e-commerce. See E-commerce,
cross-selling strategies in
Index
at Horizon
association model for, 237–239
model validation, 239–241
at Netting Inc., 241–242
association model for, 242–243
validation results, 243–244
in online advertising. See Online
advertising
Customer acquisition, 140
direct mailing for. See Direct
mailing, for customer
acquisition
Customer attrition, 140
Customer growth model, 229–231
Customer profiling, 140
Customer retention
attrition model for, 227–229
Customer satisfaction, metrics. See
Proxy metrics
Customer satisfaction segmentation
attributes in, 212
discriminant analysis for
model building, 212–213
validation, 213–217
Customer segmentation, 71, 139
behavior demographics
segmentation. See Behavior
demographics segmentation
response segmentation, 208–210
satisfaction segmentation. See
Customer satisfaction
segmentation
value segmentation, 205–208
Customer support, web analytics in,
253
Customized research. See also
Marketing research
for collecting information of market
size, 75
planning, 102–105
vs. syndicated research, 101–102
D
Database systems, integrated, 10
Data exploration, 146
Data interval, 35
Data mining
applications of
for cross-selling, 140
for customer acquisition, 140
for customer attrition, 140
for customer profiling, 140
for customer segmentation, 139
for fraud detection, 140–141
for inventory optimization, 140
for marketing program
performance forecasts, 140
for personalized messages, 140
defined, 6, 139
skills required for conducting, 14
stepwise thought process for
actionable recommendations, 145
analysis conduction, 144–145
business area determination, 142
business issues into technical
problems, translation of,
142–143
data sources identification,
143–144
goal identification, 141–142
tool and technique selection, 143
techniques. See Data mining
techniques
Data mining techniques
analysis of dependence, 145
analysis of interdependence, 145
ANOVA, 172–175
association analysis, 190
cluster analysis. See Cluster analysis
collaborative filtering, 190–192
conjoint analysis, 186, 187–188
correspondence analysis, 164,
168–172
data exploration, 146
discriminant analysis, 166–168
linear regression analysis
multiple linear regression,
149–151
simple linear regression, 146–149
logistic regression, 188–190
multi-dimensional scaling (MDS),
176
metric MDS, 177, 178–179
273
274
Index
Data mining techniques (Continued)
principal component analysis, 163,
165
time series analysis. See Time series
analysis
Data quality, 11
Debt, 36
Demographic and behavior
segmentation. See Behavior and
demographics segmentation
Dependent variable, 126
Diner’s Club credit card. See Credit
cards
Direct mail
market survey by, 106
online advertising by using, 55
operational metrics for, 67
Direct mailing, for customer
acquisition, 221–222
purchase models in
financial impact of, 226–227
prospect scoring, 227
purchase after receiving catalog,
222–224
purchase without receiving
catalog, 224–225
Direct sales, 92
Discrete variable, 117
Discriminant analysis, 166–168, 195
for customer satisfaction
segmentation
model building, 212–213
validation, 213–217
Divisive methods, in cluster analysis,
157–162
AID, 157
CART, 160–162
CHAID, 158–160
Domain analysis tools, 260
Dynamic models, 34–35
E
E-commerce
cross-selling strategies in
association model for, 237–239
validation results, 239–241
web analytics in, 252–253
Economics skills, for data mining, 14
Elasticity, of dependent and
independent variables, 23–34
E-mail
conducting market survey by using,
106
online advertising by using, 55
operational metrics for, 68
Engagement process, 14
Equity, 36
Exchange rate, 79
Expectation, of random variable, 119
Experimental design, 134–135
Explanatory/predictive variables, 23
Exponential distribution, 124
F
Face-to-face interviews, 108
Factor analysis, 165–166
F distribution, 125
Fixed costs, of investment, 33
Floating ad, 53
Focus groups, 109
Forrester Wave, 96
Fraud detection, data mining for,
140–141
Free listings, 257–258
G
Gamma function, 124–125
GDP. See Gross domestic product
(GDP)
Geometrically distributed lag model,
35
Geometric mean, 120
Global Village, on value segmentation.
See Value segmentation
Goodman-Kruskal’s coefficient, 130
Google
AdWords, 265
market share of, 76
proprietary algorithm of, 258
search engine marketing on, 53
Grid, 96
vs. perceptual map, 98
Gross domestic product (GDP), 77
Gross national product (GNP), 77
Gross profit, defined, 27
Gross rating points (GRP), 250
Index
H
Hierarchical methods, in cluster analysis
agglomerative, 152–157
Ward’s ESS, 156–157
divisive methods
AID, 157
CART, 160–162
CHAID, 158–160
Horizon, cross-selling strategies at,
236–237
association model, 237–239
validation results of, 239–241
HTML code, 261
Huber Sigma Corporation, 41
I
Independent variable, 126
Indirect sales, 92
Inflexion point, 26
Information technology, skills for data
mining, 14
Insta-Vue, 109
Insurance purchase, 229–231
Interest rate, 79
Internet advertisement. See Online
advertising
Internet marketing
corporate websites, 54–55
email, 55
newsletters, 55–56
online advertising, 51–53
search engine marketing, 53–54
webinars, 56
Interstitial ads, 52
Investment
fixed cost, 42–43
measurement with metrics, 42–43
nonresidential equipment, 77
software, 77
variable cost, 43
Investors, corporate, 36
J
JupiterResearch, 247
K
Kendall’s coefficient, 127–128
Kendall-Stuart’s coefficient, 130
Keywords, 259–260
evaluation tools, 264
Kurtosis, 122
L
Lag model, 34
geometrically distributed, 35
Lags, 34
Leaderboard, 52
Leads, 34
defined, 241
generation of, 250–252
metrics, 58
Linear model, 24
Linear regression analysis, 25,
167–168
multiple linear regression, 149–150
F statistic, 151
R2 measures, 150–151
simple linear regression, 146–148
key assumptions for, 149
Link popularity analysis tools,
260–261
Logistic model, 32
Logistic regression, 70, 188–190
LTV (lifetime value) metrics, 44
M
Magazines, 51
Magic Quadrant, 96
Manufacturing industry, website in,
267
Market growth
effected by
corporate profits, 79
currency exchange, 79
emerging technologies, 79–80
fluctuations in oil prices, 79
GDP growth, 77
interest rates, 79
political uncertainty, 77
unemployment, 79
trends, 80
Marketing
channels, 30
effort, 36
executives, 5
275
276
Index
Marketing (Continued)
intelligence
database systems, 10
defined, 10
investments, 21, 36
messaging, 70
plan. See Marketing plan
products, 31
research. See Marketing research
research companies, 88
returns. See Marketing returns
Marketing plan
based on market segmentation,
85–88
incorporating learning into, 10
incorporating market opportunity
information into, 83–84
objective of, 35
Marketing research
companies, 88
defined, 4
groups, 5
key components, 4
planning and implementation, 101
primary vs. secondary data, 105–106
report and presentation, 112–113
skills required for conducting,
13–14, 100
survey, 106–108
syndicated vs. customized, 101–105
types of, 13
Marketing returns, 21
effected by environmental changes,
33
impact of advertising on generating,
28
Marketing spending, 21
models. See Marketing spending
models
by multiple communication
channel, 30–31
by multiple products, 31–32
optimization of, 9, 27–29
Marketing spending model, 21–23
and corporate finance, 35–37
dynamic, 34–35
static, 23–34
Market opportunity
assessment of, 8
by market growth, 80
by market share, 80–81
by market size, 75–76
competitive analysis of, 8–9
impact of macroeconomic factors
on, 77–80
Market response model. See Marketing
spending model
Market segmentation
by market share, 82–83
by market size and growth, 81–82
Market share, of a company
of search engine companies, 76
in terms of revenues, 80
in terms of units sold, 81
Market size, 75–76
MDS. See Multi-dimensional scaling
(MDS)
Mean, 120
Median, 120
Meta tags, 261
Metrics
experts, 13
investment metrics, 42–43
operational metrics, 4, 61
optimization, 9, 68
analyzing data for, 70
determining time frame of, 69–70
identification of, 68–69
identifying influential attributes
of, 70–71
learning from marketing
campaigns for, 71
tools for, 70
process for identification of, 45–46
proxy metrics, 60
return metrics, 4, 42
role of, 4
selection of, 7–8
skills, 13
Mode, 120
Modern portfolio theory (MPT), 36
Mountaineer, customer acquisition
model of. See Direct mailing,
for customer acquisition
Index
Moving average (MA) model, in time
series analysis, 181
MPT. See Modern portfolio theory
(MPT)
MS-Office, 13
Multi-channel campaign
cost and target volume of, 66
learning from, 71
performance optimization of, 67–71
performance reporting of, 65–67
Multi-dimensional scaling (MDS), 176
metric MDS, 177, 178–179
Multiple linear regression, 149–150
F statistic, 151
R2 measures, 150–151
N
Netting Inc., cross-selling strategies at
association model in, 242–243
validation results, 243–244
for online advertising, 241–242
Newsletters, 55–56
Newspapers, 51
Neyman-Pearson paradigm, 131
Nielsen Media Research, 108
Nominal (or categorical) variable, 117
Nonresidential equipment, 77
Nonstationary (or evolving) markets,
23
Normal distribution, 123
NPD Group, 109
Null hypotheses, 130
O
Oil prices, 79
OLAP tools, 13
Omnibus studies, 109
One-to-one marketing
by using direct mail, 55
by using e-mail, 55
by using telemarketing, 56
Online advertising, 51–53, 241–242
association model for, 242–243
validation results, 243–244
Online banner advertisement, 52
defined, 241
operational metrics for, 67, 68
Online request forms, 14
Online site analysis tools, 259–260
Online survey, for brand awareness,
248
Onsite search
description, 265–266
for information sites, 265
vistor segmentation for, 266–267
vist scenario analysis, 267–268
Open Directory Project (ODP), 259
Operational metrics, 61
for direct mail, 67
for e-mail, 68
for online banners, 67, 68
for paid search, 69
Ordinal data, 117
Organic search, 258
Organization of Petroleum Exporting
Countries (OPEC), 79
P
PageRank, 258
Paid listings, 258
Paid search, 258, 263–264. See also
Search engine marketing (SEM)
operational metrics for, 69
Panels, 108–109
‘Parent’ magazine, 51
Partial autocorrelation function
(PACF), in time series analysis,
182–183, 185
Partitioning methods, in cluster
analysis, 162–163
Pay per click, 54, 61. See also Search
engine marketing (SEM)
Pearson correlation coefficient,
126–127
People Meter, 108
Percentile, 121
Perceptual map, 98–99
advantages and disadvantages of,
100
vs. grid, 98
Phone interview, 107
Poisson distribution, 123–124
Population, defined, 118
Pop-under ads, 53
277
278
Index
Pop-up ads, 53
Predictive dialing, of CATI, 107
Principal component analysis, 163, 165
Probability
defined, 118–119
density, 119
curve, 130, 131
density functions
of binomial distribution, 123
of chi-square ( 2) distribution,
124–125
of exponential distribution, 124
of F distribution, 125–126
of normal distribution, 123
of Poisson distribution, 123–124
of Student’s t distribution, 125
of uniform distribution, 122
distribution function, 119
mass, 119
Probability mass, 119
Product features, 9
Product life cycle, 79
Project
communication, 15
prioritization, 14–15
Proprietary algorithms, of search
engines, 258
Proxy metrics, 60
for brand awareness, 248–249
Prozac, 46–48
Purchase models, in customer
acquisition
financial impact of, 226–227
prospect scoring, 227
purchase after receiving catalog,
222–224
purchase without receiving catalog,
224–225
Q
Questionnaires. See Surveys
R
Random variables, 118
Range, 120
Rejection area, of probability density
curve, 130
Request for proposal (RFP), 103, 105
Research report, 112–113
Response modeling, 9
Response segmentation
CHAID analysis on
model building, 209–210
model validation, 210–211
Retention rate, 35
Return metrics, 42–43
at awareness stage, 57
at consideration stage, 58–59
at interest and relevance stage,
57–58
at loyalty and referral stage, 60
at purchase stage, 59
rollup of, 66
vs. operational metrics, 61
Return on investment (ROI), 43–44
tracking of, 44–45
Returns, measurement with metrics,
42
Revenue-driving factors, 71
Revenue flows, from direct and
indirect sales, 92–93
RFID (radio frequency identification),
80
RFP. See Request for proposal (RFP)
Rich media advertising, 52
Rollover ads, 53
S
Safe Net, customer growth model of,
229–231
Sales
direct and indirect, 92
effected by marketing activities, 28
stages of
awareness, 46–48
consideration, 48–49
interest and relevance, 48
loyalty and referral, 49
purchase, 49
Sample, defined, 118
Sample size, determination of
based on sample mean, 111
based on sample proportion, 111
Sampling methods
nonprobability, 110
probability, 109–110
Index
SCANTRACK, 109
Search engine marketing (SEM), 53–54
described, 263
metrics, 265
paid search marketing, 263–264
resources, 264
vs. SEO, 265
Search engine optimization (SEO)
black hat optimization, 258
organizational implementation of,
259
vs. SEM, 265
website analysis in
Alt text, 262
domain analysis tools, 260
link popularity analysis tools,
260–261
meta tags, 261
online site analysis tools, 259–260
web page title tag, 261–262
white hat optimization, 258
Search engines, 53–54
marketing. See Search engine
marketing (SEM)
market share of, 76
optimization of. See Search engine
optimization (SEO)
proprietary algorithms of, 258
website listings in, 257–258
websites’ submission to, 259
SEM. See Search engine marketing
(SEM)
Semilogarithm model, 26
Seminars, 56
Senior managers, task of, 36
SEO. See Search engine optimization
(SEO)
Shareholder value, 36
Share of wallet, defined, 71
Sigma Corporation, 41
Sigmoid functions, 32
Simple linear regression, 146–148
key assumptions for, 149
SiteCatalyst, 13, 249
Skew, 121
Skyscraper ads, 52
Social science, skills in, 13
Software investment sectors, 77
Software tools, for web analytics,
247–248
Sommer’s coefficient, 130
Spearman’s rank correlation
coefficient, 128
S-shaped model, 26
Stakeholders
business objectives, 6–7
identifying, 6
Standard deviation, 121
Standard error, 121
Static models, 23–34
defined, 22
Statistics, skills of, 14
Stores, as a marketing communication
channel, 56
Streaming video, 52
Student’s t distribution, 125
Superstitial ads, 52–53
Surveys, market research
by direct mail, 106
by email, 106
by face-to-face interview, 108
by telephonic interview, 107
SWOT analysis, 96–98
Syndicated research. See also Market
research
companies, 75
data collection for, 105–106
determination of market size with,
75–76
limitations of, 90
vs. customized research, 101–102
T
Tabulation, 95–96
Target-audience
attributes of, 88–89
primary vs. secondary data, 105
segmentation of, 89–91
Target audience migration,
identification of, 46–49
Telemarketing, 56
Text links, sponsored, 52
Time series analysis, 22
ARIMA, 183, 184
autocorrelation function (ACF),
182–183, 185
279
280
Index
Time series analysis (Continued)
autocorrelation in, 180
autoregressive (AR) model, 180–181
moving average (MA) model, 181
objective of, 179–180
partial autocorrelation function
(PACF), 182–183, 185
Toyota, 267
Tradeshows, 56
Travel Wind survey, on behavior and
demographics segmentation.
See Behavior and demographics
segmentation
t test, 131–133
U
UNC Bank, attrition model of, 227–229
Unemployment rate, and market
growth, 79
Uniform density function, of random
variable, 122
US government, and defense
investment, 77
V
Value segmentation
CART analysis of
model building, 207–208
model validation, 208
defined, 206
Variable costs, of investment, 33
Variance, 120
Vendor proposal, evaluation and
selection of, 105
Versatile Electronics survey, on
customer satisfaction
segmentation
attributes in, 211
discriminant analysis of
model building, 212–213
validation, 213–217
W
Wall Street Journal, 51
Wal-Mart, 80
Ward’s Error Sum of Squares (Ward’s
ESS), 156–157
Ward’s ESS. See Ward’s Error Sum of
Squares (Ward’s ESS)
War in Iraq, 77
Web analytics
applications of, 247
in brand awareness, 248–249
in customer support and service,
253
in e-commerce sales, 252–253
in lead generation, 250–252
software tools for, 247–248
in syndicated research, 253–254
in website content management,
249–250
Webinars, 56
Web page title tag, 261–262
Website analysis, in SEO
Alt text, 262
domain analysis tools, 260
link popularity analysis tools,
260–261
meta tags, 261
online site analysis tools, 259–260
web page title tag, 261–262
Website content management, 249–250
Websites. See also Search engines
advertising on, 51–52
calculating number of visitors on, 58
corporate, 54–55
generating online sales, 41
listings
free listings, 257–258
paid listings, 258
Prozac.com, 46
Websites’ submission, to search
engines, 259
White hat optimization, 258
Wonder Electronics’ survey, on
response segmentation. See
Response segmentation
Y
Yahoo
market share of, 76
online advertising on, 51
search engine marketing on, 52
Z
Zoomerang, 248
Z test, 131
Download