Optimal Chapter 7

advertisement
Optimal Database Marketing Drozdenko & Drake, © 2002
1
Chapter 7
Analyzing and Manipulating
Customer data
Optimal Database Marketing Drozdenko & Drake, © 2002
2
Objectives
• Getting to Know Your Data
• The Analysis
• Univariate Tabulations
• Cross Tabulations
• Logic Counter Variables
• Ratio Variables
• Longitudinal Variables
• Time Alignment of Key Events
• Reducing Customer Data Via Correlation Analysis
• Statistical Background - Correlation Analysis
• Using Excel to Run Correlations
Optimal Database Marketing Drozdenko & Drake, © 2002
3
Getting to Know Your Data
It is imperative that a direct marketing professional develop an intimate
knowledge of the company’s customer data prior any attempt to analyze
the customer data.
The familiarization process should include meeting with the staff
responsible for creating and maintaining the marketing data as well as
those responsible for applying data updating rules.
A detailed comprehension of the logic applied when designing the
customer file and the rules employed for updating each piece of
information contained on the file is crucial for analysis.
Optimal Database Marketing Drozdenko & Drake, © 2002
4
Getting to Know Your Data (Cont.)
Questions to ask may include:
• Does a partial payment for a product order update (increment) the
“Total Products Paid” field?
• If a partial payment for a product order increments the “Total Products
Paid,” does it do so for all product lines?
• If a member of the company’s music club orders a CD rack from the
club’s monthly catalogue, does payment for that order update the “Total
Music Club Dollars Paid” field?
• If a customer orders and pays, then immediately cancels, is the “Total
Products Paid” field properly decremented? Is the “Total Products
Ordered” field properly decremented?
Optimal Database Marketing Drozdenko & Drake, © 2002
5
Getting to Know Your Data (Cont.)
The criteria for selecting names for a product promotion is only as good
as the data residing on the database.
When customer data elements are improperly maintained or updated
inconsistently, or if rules are illogically inconsistent from one product line
to another, the error rate grows exponentially.
Recommend changes if inconsistencies between product lines or errors
in the logic are found.
The odds are, the longer the direct marketing firm has been in business
and the more product lines present, the more errors and inconsistencies
will exist.
Optimal Database Marketing Drozdenko & Drake, © 2002
6
The Analysis
There are numerous methods available to help you assess the
customer data residing on a frozen file for determining your target
market for a particular product offering. These techniques include the
assessment of:
•
•
•
•
•
Univariate tabulations
Cross tabulations
Logic counter variables
Ratio variables
Longitudinal variables
These various techniques to viewing and manipulating customer data
will help you determine which variables are the most important in terms
of defining your target market.
Optimal Database Marketing Drozdenko & Drake, © 2002
7
The Analysis (Cont.)
Whether defining your target market, building a response model or
segmenting the customer file, 50% - 75% of the analyst’s time should be
spent on data preparation using the techniques previously mentioned.
This is true even when using a “data-mining” tool. If the data is not well
prepared and understood prior to “feeding” it into such a piece of
software, the analysis will yield sub-optimal and potentially misleading
results.
More on data-mining tools will be discussed in Chapter 10.
Optimal Database Marketing Drozdenko & Drake, © 2002
8
Univariate Tabulations
In preparing to determine a “select,” build a target model, or segment a
customer file, univariate analysis is the most commonly used form of
analysis.
A univariate analysis involves the tabulation and viewing of the
categories of a single variable or data element.
In the case of a product offer test, a univariate tabulation produced on
an analysis sample (Chapter 6) will display the percent responders and
non-responders to the offer for the various categories of each data
element.
This analysis enables the analyst to determine which data elements
have a strong relationship with the customer behavior of concern
(ordering, paying, renewing, etc.).
Optimal Database Marketing Drozdenko & Drake, © 2002
9
Univariate Tabulations (Cont.)
Assume ACME Direct, a direct marketer of books, music, videos and magazines test
promoted a new music CD collection titled “Pop Rock USA” (PRUSA) to 20,000
customers selected from the customer segment “5+ paid orders in the past 24
months” as shown below.
Entire Customer File
Universe Size = 4,670,513
Customers with past
purchases that have been
written-off
Universe Size = 323,156
20,000 sample taken
from this customer
segment.
Customers who frequently
return merchandise (and
have no written-off
purchases)
Universe Size = 533,966
Customers with 5+ paid
orders in the past 24
months
Universe Size = 1,256,479
Optimal Database Marketing Drozdenko & Drake, © 2002
Remaining customers
(the balance)
Universe Size = 3,813,391
Remaining Customers
(the balance)
Universe Size = 2,556,912
10
Univariate Tabulations (Cont.)
The test response rate received was 2.50%. The sample was split
50/50 -- 10,000 names for analysis and 10,000 names for validation
(Chapter 6).
The analyst was asked to examine various univariate tabulations
produced on the analysis segment to determine if any of the following
four data elements appeared to distinguish responders from nonresponders for this particular offer.
•
•
•
•
an age indicator
an indicator of past music purchase activity
the total number of orders placed by each customer
the total number of promotions sent to each customer
Optimal Database Marketing Drozdenko & Drake, © 2002
11
Univariate Tabulations (Cont.)
Starting with the analysis of age, an example of a typical univariate
tabulation appears on the next slide. This tabulation displays how “Pop
Rock USA” (PRUSA) orders break out by various age categories.
Note: The categories can vary depending on how you decide to view
the data. You can have as many age categories as you like; however,
keep in mind that if you break out the variable into too many categories,
it may result in sparse data. See the book for more details regarding
this topic.
Optimal Database Marketing Drozdenko & Drake, © 2002
12
Univariate Tabulations (Cont.)
Head of Household Age
30 and under
31-40
41-50
51-60
61 and over
No age info available
Total
Number
1,529
1,775
1,879
2,054
1,785
978
10,000
% of Number of Response
Sample
Orders
Rate
15.29%
67
4.38%
17.75%
63
3.55%
18.79%
46
2.45%
20.54%
29
1.41%
17.85%
18
1.01%
9.78%
27
2.76%
100.00%
250
2.50%
Index to
Total
175
142
98
56
40
110
100
The column titled “Number” represents the number of names in the
sample of 10,000 falling into each age category. For example, in this
sample 1,529 people out of 10,000 are 30 years of age and under.
The “% of Sample” represents the percent of the corresponding
“Number” as it relates to the total sample size. In this case, 1,529
represents 15.29% of the total.
Optimal Database Marketing Drozdenko & Drake, © 2002
13
Univariate Tabulations (Cont.)
Head of Household Age
30 and under
31-40
41-50
51-60
61 and over
No age info available
Total
Number
1,529
1,775
1,879
2,054
1,785
978
10,000
% of Number of Response
Sample
Orders
Rate
15.29%
67
4.38%
17.75%
63
3.55%
18.79%
46
2.45%
20.54%
29
1.41%
17.85%
18
1.01%
9.78%
27
2.76%
100.00%
250
2.50%
Index to
Total
175
142
98
56
40
110
100
The “Number of Orders” column represents how many of the total 250
orders for this particular product offer fell into each category. In this
example, 67 of the 250 total orders for this product were placed by
individuals 30 years of age and under.
The response rate to this product offer is 4.38% (67/1,529) for those
customers 30 years of age and under.
Optimal Database Marketing Drozdenko & Drake, © 2002
14
Univariate Tabulations (Cont.)
Head of Household Age
30 and under
31-40
41-50
51-60
61 and over
No age info available
Total
Number
1,529
1,775
1,879
2,054
1,785
978
10,000
% of Number of Response
Sample
Orders
Rate
15.29%
67
4.38%
17.75%
63
3.55%
18.79%
46
2.45%
20.54%
29
1.41%
17.85%
18
1.01%
9.78%
27
2.76%
100.00%
250
2.50%
Index to
Total
175
142
98
56
40
110
100
The index value associated for those customers 30 years of age and
under is calculated by taking the response rate of this group and
dividing by the response rate for the entire sample and multiplying by
100. An index value of 175 for this group tells you that the response
rate for this age group is 75% higher than the response rate for all
names. This 75% figure is called the “gain in response.”
How does one interpret the index value of 40 associated with the “61
and over” group?
Optimal Database Marketing Drozdenko & Drake, © 2002
15
Univariate Tabulations (Cont.)
Head of Household Age
30 and under
31-40
41-50
51-60
61 and over
No age info available
Total
Number
1,529
1,775
1,879
2,054
1,785
978
10,000
% of Number of Response
Sample
Orders
Rate
15.29%
67
4.38%
17.75%
63
3.55%
18.79%
46
2.45%
20.54%
29
1.41%
17.85%
18
1.01%
9.78%
27
2.76%
100.00%
250
2.50%
Index to
Total
175
142
98
56
40
110
100
Assuming the break-even for this particular product offering is a
response rate of at least 3.00%, which names could the product manger
profitably promote using age information alone?
If the goal of this promotion is to generate 5% profit after overhead and
the resulting response rate required to generate this profit level is found
to be 4.25%, which names can the product manager promote ensuring
her goal is met?
Optimal Database Marketing Drozdenko & Drake, © 2002
16
Univariate Tabulations (Cont.)
Last year ACME Direct promoted customers on the database for “Rock
and Roll Party” (RRP).
Would you think those on the database that bought a rock music
package from ACME in the past would be more or less likely to order a
new rock music package from ACME today?
The analyst will confirm his feelings and determine if those customers in
the analysis sample that ordered last years rock package (RRP) were
more likely or less likely to have ordered the new rock package
(PRUSA) by tabbing the data.
The tabulation on the next slide displays how PRUSA orders break out
by the various customer actions taken when promoted for last year’s
RRP.
Optimal Database Marketing Drozdenko & Drake, © 2002
17
Univariate Tabulations (Cont.)
Action to Last Year's
Rock and Roll Party
Promoted & Ordered
Promoted & Not Ordered
Not Promoted
Not Available
Total
Number
877
3,967
3,911
1,245
10,000
% of Number of Response
Sample
Orders
Rate
8.77%
51
5.82%
39.67%
93
2.34%
39.11%
73
1.87%
12.45%
33
2.65%
100.00%
250
2.50%
Index to
Total
233
94
75
106
100
Customers that were promoted last year for RRP and ordered it have
the highest order rate for PRUSA (the first category above). Why?
Optimal Database Marketing Drozdenko & Drake, © 2002
18
Univariate Tabulations (Cont.)
Action to Last Year's
Rock and Roll Party
Promoted & Ordered
Promoted & Not Ordered
Not Promoted
Not Available
Total
Number
877
3,967
3,911
1,245
10,000
% of Number of Response
Sample
Orders
Rate
8.77%
51
5.82%
39.67%
93
2.34%
39.11%
73
1.87%
12.45%
33
2.65%
100.00%
250
2.50%
Index to
Total
233
94
75
106
100
Customers that were promoted for RRP last year but did not order (the
second category above) have a PRUSA response rate below the
response rate of the entire sample (2.34% vs. 2.50%). Why?
Optimal Database Marketing Drozdenko & Drake, © 2002
19
Univariate Tabulations (Cont.)
Action to Last Year's
Rock and Roll Party
Promoted & Ordered
Promoted & Not Ordered
Not Promoted
Not Available
Total
Number
877
3,967
3,911
1,245
10,000
% of Number of Response
Sample
Orders
Rate
8.77%
51
5.82%
39.67%
93
2.34%
39.11%
73
1.87%
12.45%
33
2.65%
100.00%
250
2.50%
Index to
Total
233
94
75
106
100
Additionally, customers “not promoted” for RRP last year have a PRUSA
response rate even lower than customers “promoted and not ordered”
(1.87% vs. 2.34%). Why?
Optimal Database Marketing Drozdenko & Drake, © 2002
20
Univariate Tabulations (Cont.)
Action to Last Year's
Rock and Roll Party
Promoted & Ordered
Promoted & Not Ordered
Not Promoted
Not Available
Total
Number
877
3,967
3,911
1,245
10,000
% of Number of Response
Sample
Orders
Rate
8.77%
51
5.82%
39.67%
93
2.34%
39.11%
73
1.87%
12.45%
33
2.65%
100.00%
250
2.50%
Index to
Total
233
94
75
106
100
Who exactly are the customers identified as “Not Available” in the above
tabulation?
Optimal Database Marketing Drozdenko & Drake, © 2002
21
Univariate Tabulations (Cont.)
Action to Last Year's
Rock and Roll Party
Promoted & Ordered
Promoted & Not Ordered
Not Promoted
Not Available
Total
Number
877
3,967
3,911
1,245
10,000
% of Number of Response
Sample
Orders
Rate
8.77%
51
5.82%
39.67%
93
2.34%
39.11%
73
1.87%
12.45%
33
2.65%
100.00%
250
2.50%
Index to
Total
233
94
75
106
100
Assuming the break-even for this particular product offering is a
response rate of at least 3.00%, which names could the product manger
profitably promote using this particular piece of past purchase
information alone?
Optimal Database Marketing Drozdenko & Drake, © 2002
22
Univariate Tabulations (Cont.)
Next, the analyst examines “Total Number of Orders Ever” across all
product lines shown below. This tabulation displays for the analyst how
“Pop Rock USA” orders break out by various categories regarding each
customer’s count of past orders placed. The categories can vary and
depend on how you decide to view the data. For this example, we
created 5 categories.
Total Number of Orders
Ever (all product lines)
0
1-5
6-10
11-15
15 +
Total
Number
0
3,312
3,074
2,205
1,409
10,000
% of Number of
Sample
Orders
0.00%
0
33.12%
62
30.74%
68
22.05%
64
14.09%
56
100.00%
250
Response
Rate
0.00%
1.87%
2.21%
2.90%
3.97%
2.50%
Index to
Total
0
75
88
116
159
100
Assuming the product manager decides to promote any category
meeting her break-even of 3.00%, which names can she profitably
promote?
Optimal Database Marketing Drozdenko & Drake, © 2002
23
Univariate Tabulations (Cont.)
Total Number of Orders
Ever (all product lines)
0
1-5
6-10
11-15
15 +
Total
Number
0
3,312
3,074
2,205
1,409
10,000
% of Number of
Sample
Orders
0.00%
0
33.12%
62
30.74%
68
22.05%
64
14.09%
56
100.00%
250
Response
Rate
0.00%
1.87%
2.21%
2.90%
3.97%
2.50%
Index to
Total
0
75
88
116
159
100
Note that for this tabulation the response rates increase as the number
of orders increases. This is a pattern we like to see for such counter
variables; however, this is not always the case.
Some tabulations may reflect a slight “bouncing” of the response rates
but a pattern should be obvious. A severe bouncing around of response
rates may reflect a problem with the sample and/or the data.
Optimal Database Marketing Drozdenko & Drake, © 2002
24
Univariate Tabulations (Cont.)
In the final step, the analyst examines the univariate tabulation of “Total
Number of Promotions Sent Ever” across all product lines shown below.
Based on promotional data, can the product manager find a group to
profitably promote?
Total Number Promotions
Ever (all product lines)
1-5
6-10
11-20
21-30
31 +
Total
Optimal Database Marketing Drozdenko & Drake, © 2002
Number
0
768
2,544
3,563
3,125
10,000
% of Number of Response
Sample
Orders
Rate
0.00%
0
0.00%
7.68%
16
2.08%
25.44%
57
2.24%
35.63%
108
3.03%
31.25%
69
2.21%
100.00%
250
2.50%
Index to
Total
0
83
90
121
88
100
25
Univariate Tabulations (Cont.)
Total Number Promotions
Ever (all product lines)
1-5
6-10
11-20
21-30
31 +
Total
Number
0
768
2,544
3,563
3,125
10,000
% of Number of Response
Sample
Orders
Rate
0.00%
0
0.00%
7.68%
16
2.08%
25.44%
57
2.24%
35.63%
108
3.03%
31.25%
69
2.21%
100.00%
250
2.50%
Index to
Total
0
83
90
121
88
100
When examining promotional counter variables on their own, it is
important to keep in mind they can be very misleading.
Why is this?
Optimal Database Marketing Drozdenko & Drake, © 2002
26
Univariate Tabulations (Cont.)
Total Number Promotions
Ever (all product lines)
1-5
6-10
11-20
21-30
31 +
Total
Number
0
768
2,544
3,563
3,125
10,000
% of Number of Response
Sample
Orders
Rate
0.00%
0
0.00%
7.68%
16
2.08%
25.44%
57
2.24%
35.63%
108
3.03%
31.25%
69
2.21%
100.00%
250
2.50%
Index to
Total
0
83
90
121
88
100
This is why you will typically not see the response rate monotonically
(i.e., smoothly) increasing or decreasing as was the case for the
variable “Total Number of Orders Ever.” Too many other things are
going on here.
Optimal Database Marketing Drozdenko & Drake, © 2002
27
Univariate Tabulations (Cont.)
Can you think of some way in which you can
further exploit and use this promotional counter data?
Optimal Database Marketing Drozdenko & Drake, © 2002
28
Cross Tabulations
Cross-tabulations are a means of viewing two or more data elements in
combination.
“Cross-tabbing” highlights interrelationships among
variables. It can take a relatively weak data element, in terms of it
predictive strength, and change it into a powerful predictor.
Continuing with our example, the analyst cross-tabulates the
“promotion” counter field with the “total orders” counter field. This
tabulation displays for the analyst how the “Pop Rock USA” orders
break out by various combinations of the univariate order and promotion
categories.
Optimal Database Marketing Drozdenko & Drake, © 2002
29
Cross Tabulations (Cont.)
Total Promotions Ever:
Total Orders Ever:
1-5
6-10
11-20
21-30
31 plus
Total
0
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
1-5
0.00%
(0/0)
1.63%
(8/491)
1.76%
(17/967)
2.34%
(20/856)
1.60%
(16/998)
1.87%
(62/3,312)
6-10
0.00%
(0/0)
2.89%
(8/277)
1.85%
(14/756)
2.51%
(29/1,154)
1.80%
(16/887)
2.21%
(68/3,074)
11-15
0.00%
(0/0)
0.00%
(0/0)
3.03%
(14/462)
3.03%
(29/956)
2.67%
(21/787)
2.90%
(64/2,205)
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
2.08%
(16/768)
3.34%
(12/359)
2.24%
(57/2,544)
5.03%
(30/597)
3.03%
(108/3,563)
3.53%
(16/453)
2.21%
(69/3,125)
3.97%
(56/1,409)
2.5%
(250/10,000)
15 plus
Total
The “Total” for each row represents the “univariate figures” for that
variable. In this case, these values match the response rate figures on
the univariate tabulation for “Total Number of Orders ever.”
Optimal Database Marketing Drozdenko & Drake, © 2002
30
Cross Tabulations (Cont.)
Total Promotions Ever:
Total Orders Ever:
1-5
6-10
11-20
21-30
31 plus
Total
0
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
1-5
0.00%
(0/0)
1.63%
(8/491)
1.76%
(17/967)
2.34%
(20/856)
1.60%
(16/998)
1.87%
(62/3,312)
6-10
0.00%
(0/0)
2.89%
(8/277)
1.85%
(14/756)
2.51%
(29/1,154)
1.80%
(16/887)
2.21%
(68/3,074)
11-15
0.00%
(0/0)
0.00%
(0/0)
3.03%
(14/462)
3.03%
(29/956)
2.67%
(21/787)
2.90%
(64/2,205)
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
2.08%
(16/768)
3.34%
(12/359)
2.24%
(57/2,544)
5.03%
(30/597)
3.03%
(108/3,563)
3.53%
(16/453)
2.21%
(69/3,125)
3.97%
(56/1,409)
2.5%
(250/10,000)
15 plus
Total
The “Total” for each column represents the “univariate figures” for that
variable. In this case, these values match the response rate figure on
the univariate tabulation for “Total Number of Promotions Ever.”
Optimal Database Marketing Drozdenko & Drake, © 2002
31
Cross Tabulations (Cont.)
Total Promotions Ever:
Total Orders Ever:
1-5
6-10
11-20
21-30
31 plus
Total
0
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
1-5
0.00%
(0/0)
1.63%
(8/491)
1.76%
(17/967)
2.34%
(20/856)
1.60%
(16/998)
1.87%
(62/3,312)
6-10
0.00%
(0/0)
2.89%
(8/277)
1.85%
(14/756)
2.51%
(29/1,154)
1.80%
(16/887)
2.21%
(68/3,074)
11-15
0.00%
(0/0)
0.00%
(0/0)
3.03%
(14/462)
3.03%
(29/956)
2.67%
(21/787)
2.90%
(64/2,205)
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
2.08%
(16/768)
3.34%
(12/359)
2.24%
(57/2,544)
5.03%
(30/597)
3.03%
(108/3,563)
3.53%
(16/453)
2.21%
(69/3,125)
3.97%
(56/1,409)
2.5%
(250/10,000)
15 plus
Total
Cells not associated with the “Total” row and “Total” column represent the intersection of
these two pieces of information. For example, the cell circled in red denotes that 491
names in the sample of 10,000 have “1-5” orders and “6-10” promotions. This same
group yielded 8 of the 250 total orders for an order rate of 1.63% (8/491).
Why are there zero names falling into the cell with “15 plus orders” and “1-5 promotions”?
Optimal Database Marketing Drozdenko & Drake, © 2002
32
Cross Tabulations (Cont.)
Total Promotions Ever:
Total Orders Ever:
1-5
6-10
11-20
21-30
31 plus
Total
0
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
1-5
0.00%
(0/0)
1.63%
(8/491)
1.76%
(17/967)
2.34%
(20/856)
1.60%
(16/998)
1.87%
(62/3,312)
6-10
0.00%
(0/0)
2.89%
(8/277)
1.85%
(14/756)
2.51%
(29/1,154)
1.80%
(16/887)
2.21%
(68/3,074)
11-15
0.00%
(0/0)
0.00%
(0/0)
3.03%
(14/462)
3.03%
(29/956)
2.67%
(21/787)
2.90%
(64/2,205)
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
2.08%
(16/768)
3.34%
(12/359)
2.24%
(57/2,544)
5.03%
(30/597)
3.03%
(108/3,563)
3.53%
(16/453)
2.21%
(69/3,125)
3.97%
(56/1,409)
2.5%
(250/10,000)
15 plus
Total
Why does no one in this sample have zero orders?
Why are there zero names falling into the cell with “15 plus orders” and
“1-5 promotions”?
Optimal Database Marketing Drozdenko & Drake, © 2002
33
Cross Tabulations (Cont.)
Total Promotions Ever:
Total Orders Ever:
1-5
6-10
11-20
21-30
31 plus
Total
0
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
1-5
0.00%
(0/0)
1.63%
(8/491)
1.76%
(17/967)
2.34%
(20/856)
1.60%
(16/998)
1.87%
(62/3,312)
6-10
0.00%
(0/0)
2.89%
(8/277)
1.85%
(14/756)
2.51%
(29/1,154)
1.80%
(16/887)
2.21%
(68/3,074)
11-15
0.00%
(0/0)
0.00%
(0/0)
3.03%
(14/462)
3.03%
(29/956)
2.67%
(21/787)
2.90%
(64/2,205)
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
2.08%
(16/768)
3.34%
(12/359)
2.24%
(57/2,544)
5.03%
(30/597)
3.03%
(108/3,563)
3.53%
(16/453)
2.21%
(69/3,125)
3.97%
(56/1,409)
2.5%
(250/10,000)
15 plus
Total
Assuming the break-even for this particular product offering is a
response rate of at least 3.00%, which names could the product
manager profitably promote using this cross tabulation of information?
Optimal Database Marketing Drozdenko & Drake, © 2002
34
Cross Tabulations (Cont.)
Total Promotions Ever:
Total Orders Ever:
1-5
6-10
11-20
21-30
31 plus
Total
0
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
1-5
0.00%
(0/0)
1.63%
(8/491)
1.76%
(17/967)
2.34%
(20/856)
1.60%
(16/998)
1.87%
(62/3,312)
6-10
0.00%
(0/0)
2.89%
(8/277)
1.85%
(14/756)
2.51%
(29/1,154)
1.80%
(16/887)
2.21%
(68/3,074)
11-15
0.00%
(0/0)
0.00%
(0/0)
3.03%
(14/462)
3.03%
(29/956)
2.67%
(21/787)
2.90%
(64/2,205)
0.00%
(0/0)
0.00%
(0/0)
0.00%
(0/0)
2.08%
(16/768)
3.34%
(12/359)
2.24%
(57/2,544)
5.03%
(30/597)
3.03%
(108/3,563)
3.53%
(16/453)
2.21%
(69/3,125)
3.97%
(56/1,409)
2.5%
(250/10,000)
15 plus
Total
The product manager can promote the highlighted boxes yielding
28.27% of the sample = (462+956+359+597+453)/10,000.
Optimal Database Marketing Drozdenko & Drake, © 2002
35
Cross Tabulations (Cont.)
Based solely on the data field “total orders,” the product manager was
able to promote only those names with 15+ orders (assuming a breakeven response rate of 3.00%).
By cross-tabulating this data field with “total promotions sent,” the
product manager was able to additionally promote those with 11-15
orders if the number of promotions sent does not exceed 30.
By cross-tabulating two data fields the product manager was able to
double the number of names she can promote and still meet her breakeven criteria.
Optimal Database Marketing Drozdenko & Drake, © 2002
36
Counter Variables
When you have several data elements measuring the same customer
“dimension,” consider the creation of a counter variable.
For example, if you have several data elements on your database all
measuring the customer’s interest in cooking, consider combining them
and creating a single counter variable.
The resulting variable will be, in most cases, stronger than the
independent component variables and help you better assess a
customer’s true interest in cooking (or whatever it is you are trying to
measure).
Optimal Database Marketing Drozdenko & Drake, © 2002
37
Counter Variables
There are three reasons to create counter variables:
• Provides a means of data reduction.
• By combining similar data elements into one provides a means of
strengthening the predictive power of low coverage data elements such
as some enhancement data.
• Logic counter variables serve an important role in the development of
a parsimonious and stable regression model. (This topic is discussed in
greater detail in Chapter 10.)
Optimal Database Marketing Drozdenko & Drake, © 2002
38
Counter Variables (Cont.)
Suppose you enhance your database with the following five pieces of
information from an outside vendor relating to cooking interests:
•
•
•
•
•
Hobby Cooking? (1 if yes, 0 otherwise)
Buy Cookbooks? (1 if yes, 0 otherwise)
Enjoy Wine? (1 if yes, 0 otherwise)
Have a Vegetable Garden? (1 if yes, 0 otherwise)
Subscribe to Cooking Magazines? (1 if yes, 0 otherwise)
What are your options to use such data in a name selection?
Optimal Database Marketing Drozdenko & Drake, © 2002
39
Counter Variables (Cont.)
You could examine each variable separately via univariate tabulations or
combine them into a counter variable.
To create this “cooking counter” variable you simply count the number of
“yes” responses for each customer.
In this example, it will tell you the degree of interest each customer has
in cooking based on the number of “yes” responses. The more “yes”
responses, the stronger the customer’s interest in cooking.
What is the range of possible values that this new counter variable can
take on?
Optimal Database Marketing Drozdenko & Drake, © 2002
40
Counter Variables (Cont.)
You can also create counter variables from internal product purchase
information assuming they are related in some manner. For example,
suppose over the past three years ACME Direct offered four different
rock music CD collections for sale:
•
•
•
•
Rock and Roll Party
The Soul of Rock and Roll
Early Rock Legends
Easy Listening Rock
Optimal Database Marketing Drozdenko & Drake, © 2002
41
Counter Variables (Cont.)
You could examine each product variable separately via univariate
tabulations (as we did earlier for RRP) or combine them into a counter
variable.
To create this “rock music counter” variable you simply count, for each
customer, how many of these four rock titles they purchased. In this
case, the logic counter variable will tell you the degree of interest each
customer has in rock music based on how many different rock music CD
collections they purchased. The more rock music CD collections
purchased, the stronger their interest in rock music.
What is the range of possible values that this new logic counter variable
can take on?
Optimal Database Marketing Drozdenko & Drake, © 2002
42
Counter Variables (Cont.)
These new variables can be programmed (coded) and used to select
customers for an upcoming promotion.
For example, in preparation for a new rock music CD collection
promotion, ACME Direct might select anyone from the database that
purchased at least two of the past rock music CD collections offered.
Optimal Database Marketing Drozdenko & Drake, © 2002
43
Counter Variables (Cont.)
The figure below displays three customers selected at random from the
analysis sample of 10,000 customers test promoted for “Pop Rock
USA.”
What are the values of the new “rock music counter” variable for each of
these three customers?
Customer
Name
S. Jones
B. Smith
K. Brown
Customer
Address
123 Main
th
8 Ave.
45 Oak St.
Total $
Paid
$356.34
$643.22
$264.98
…
…
…
…
RRP
PD
PNO
NP
TSRR
PNO
PD
NP
ERL
PNO
PD
PNO
ELR
PD
PD
PD
Rock Music
Counter
?
?
?
LEGEND: RRP=Rock and Roll Party, TSRR=The Soul of Rock and Roll, ERL=Early Rock Legends, ELR=Easy Listening
Rock, PD=Paid, PNO=Promoted but not ordered, NP=Not promoted.
Optimal Database Marketing Drozdenko & Drake, © 2002
44
Counter Variables (Cont.)
Once everyone’s value on the sample for this new “rock music counter”
variable is determined, the analyst can display the results by creating a
univariate tabulation in SASTM. Doing so will allow the analyst to
determine this new variable’s strength in terms of distinguishing
responders from non-responders for “Pop Rock USA” as shown below.
Rock Logic: (RRP, TSRR,
ERL, ELR)
Purchased 0
Purchased 1
Purchased 2
Purchased 3
Purchased 4
Total
Optimal Database Marketing Drozdenko & Drake, © 2002
Number
7,856
945
633
365
201
10,000
% of Number of Response
Sample
Orders
Rate
78.56%
118
1.50%
9.45%
47
4.97%
6.33%
40
6.32%
3.65%
27
7.40%
2.01%
18
8.96%
100.00%
250
2.50%
Index to
Total
60
199
253
296
358
100
45
Counter Variables (Cont.)
It is not always obvious which data elements are related to one another.
In such cases one can run a multivariate analysis to determine which
data elements are correlated with one another. Once determined, the
correlated data elements can be combined into a counter variable.
Multivariate analysis will be discussed in more detail in Chapter 8.
Optimal Database Marketing Drozdenko & Drake, © 2002
46
Ratio Variables
Ratio variables are the result of dividing one data element by another
and can be easily programmed in SASTM or any spreadsheet
application. The data elements comprising the ratio variable must be
continuous in nature. Below are examples of ratio variables created
from data on ACME Direct’s customer file.
RATIO VARIABLE
Total Products Paid for each customer
divided by Total Products Ordered
Total Book Products Paid divided by Total
Products Paid
Total Orders divided by Total Promotions
Sent
Total Music Paid Orders divided by Total
Music Promotions Sent
Optimal Database Marketing Drozdenko & Drake, © 2002
MEASUREMENT OF
Estimate of average payment rate for each
customer
Strength of the book affinity as opposed to
others (music, videos, gifts, etc.)
Overall responsiveness to all promotions
Estimate of average paid response rate for
music orders
47
Ratio Variables (Cont.)
Below is a partial printout of three customers from the 10,000-name analysis
sample tested for “Pop Rock USA.” All data is representative of the point-intime of the promotion. Based on “Total Orders” information alone, which of
the three customers is most likely to respond to this offer?
Customer Name
T. Bluestone
R. Stewart
J. Jackson
Optimal Database Marketing Drozdenko & Drake, © 2002
Customer Address
555 Maple
56 South Main
111 Rocky Rd.
Total Orders
10
7
2
48
Ratio Variables (Cont.)
Based on “Total Promotions” information alone shown below who is
more likely to respond to this offer? (Assume you were forced to make
a selection based on this information alone.)
Customer Name
T. Bluestone
R. Stewart
J. Jackson
Optimal Database Marketing Drozdenko & Drake, © 2002
Customer Address
555 Maple
56 South Main
111 Rocky Rd.
Total Promotions
84
55
12
49
Ratio Variables (Cont.)
Consider the ratio of “Total Orders” to “Total Promotions” for each
customer and determine who is more likely to respond to this offer. The
creation of the ratio variable is shown below.
Customer Name
Customer Address
T. Bluestone
R. Stewart
J. Jackson
555 Maple
56 South Main
111 Rocky Rd.
Optimal Database Marketing Drozdenko & Drake, © 2002
Total
Promotions
84
55
12
Total Orders
10
7
2
Ratio of Orders to
Promotions
10/84 = 11.90%
7/55 = 12.73%
2/12 = 16.67%
50
Ratio Variables (Cont.)
Given what you have learned so far in this chapter,
can you think of another way that the product manager could have
combined these two pieces of data other than a ratio variable?
Optimal Database Marketing Drozdenko & Drake, © 2002
51
Longitudinal Variables
Longitudinal variables allow a direct marketer to view a particular data
element for each customer across time. Conceptually, they are similar
to time-series variables.
They can be quite difficult to implement.
They are based on the premise that the best predictor of a customer’s
response to a future promotion is a review of his most recent past
responses and reactions to promotions.
Optimal Database Marketing Drozdenko & Drake, © 2002
52
Longitudinal Variables (Cont.)
The figure below displays some examples of longitudinal/time-series
variables.
LONGITUDINAL V ARIABLE
Customer’s response (order, pay, silent,
etc.) to their last 3 promotions sent
Customer’s action (pay, return, bad-debt) to
their last 3 orders placed
Customer’s last 3 product affinity shipments
for the music club (rock, pop, country, etc.)
Optimal Database Marketing Drozdenko & Drake, © 2002
MEASUREMENT OF
Estimate of customer’s action on next promotion
sent
Estimate of customer’s performance on next
order placed
Estimate of customer’s most current music
interests
53
Longitudinal Variables (Cont.)
The figure below displays information on three customers taken from the
ACME Direct database. The information relates to customer responses
and reactions to the last three promotions they received from ACME
Direct in addition to their overall ratio of “total paid products ever to total
promotions ever.”
Based on the ratio of “total paid products ever to total promotions ever,”
who would you select for an upcoming promotion?
Customer Name
A. Flintstone
P. Johnson
X. Wesley
Optimal Database Marketing Drozdenko & Drake, © 2002
Ratio of total paid
products ever to total
promotions ever
.2546
.3796
.1408
54
Longitudinal Variables (Cont.)
If you considered the three most recent actions taken by these
customers as shown below, who would you select for an upcoming
promotion?
Ratio of total paid
products ever to total
promotions ever
.2546
Two
Promotions
Ago
Non-Response
One
Promotion
Ago
Non-Response
Order and Pay
P. Johnson
.3796
Order and Pay
Non-Response
Non-Response
X. Wesley
.1408
Non-Response
Order and Pay
Order and Pay
Customer
Name
A. Flintstone
Optimal Database Marketing Drozdenko & Drake, © 2002
Latest
Promotion
55
Longitudinal Variables (Cont.)
These types of variables also allow a direct marketer to examine which
customers are becoming weaker or stronger over time.
For example, a customer with a history of ordering products at a high
rate may suddenly become non-responsive to offers. By identifying
change (for the better or the worse) in customer behavior via
longitudinal variables, the direct marketer will be in a good position to
implement various “customer relationship management” or CRM
programs to either reward or combat such changes in behavior.
Optimal Database Marketing Drozdenko & Drake, © 2002
56
Time Alignment of Key Events
Direct marketers who regularly communicate with customers due to the
“continuous” nature of their business may require the “time alignment” of
key events in order to best leverage the customer data. Such
businesses include:
•
•
•
•
•
•
Clubs (e.g. music, video, etc.)
Continuities or collectible marketers (e.g. books)
Frequent buyer clubs
Frequent travel services
Banking and financial services
Telecommunications and cable companies
Optimal Database Marketing Drozdenko & Drake, © 2002
57
Time Alignment of Key Events (Cont.)
For example, club direct marketers cannot compare customers that
joined the club at different points in time and be in a position to make
proper decisions about how to treat such customers.
In order to say one customer is better than another in this environment
requires the direct marketer to ensure both customers joined the club at
or around the same point in time or came into the club with the same
offer or from the same source.
Optimal Database Marketing Drozdenko & Drake, © 2002
58
Time Alignment of Key Events (Cont.)
To develop predictors defining your best customers in a club
environment, you may be well advised to first group all customers on the
date of the initial enrollment in service. Doing so will take into account
any differences, for example, in offers made over time and allow for
accurate customer evaluation.
Optimal Database Marketing Drozdenko & Drake, © 2002
59
Time Alignment of Key Events (Cont.)
Additionally, you may be well advised to align your customers on the
first up-sell purchase.
Failing to align customers on the first up-sell could result in comparing
the “time to pay for the first up-sell product” for customers who were
given different up-sell product offers (where some of the up-sell product
offers may have been better than others).
Optimal Database Marketing Drozdenko & Drake, © 2002
60
Reducing Customer Data Via Correlation Analysis
Many direct marketers have hundreds even thousands of data elements
(internal and external) residing on their database.
To individually assess the strength of each one every time a target
definition is to be determined would be costly in terms of time expended.
Prior to beginning the analysis, many direct marketers run a correlation
analysis for each variable of a continuous nature to determine which
variables are the most correlated with the action to be predicted (order,
payment, renewal, etc.).
This helps subset the list of candidate variables to be examined to a
more manageable number.
Once subset, techniques previously discussed such as univariate and
cross-tabulation analyses are conducted.
Optimal Database Marketing Drozdenko & Drake, © 2002
61
Reducing Customer Data Via Correlation Analysis (Cont.)
Correlation analysis can answer questions such as:
• Is the data element “total number of books purchased in the past 12
months” positively correlated with a customer’s likelihood of ordering a
new video series?
• Is income level positively correlated with the amount a customer
spends on a catalogue promotion? In other words, do customers with
higher incomes spend more on catalogue orders?
Optimal Database Marketing Drozdenko & Drake, © 2002
62
Reducing Customer Data Via Correlation Analysis (Cont.)
The measure of correlation is a value between -1 and +1 that reflects
both the strength and the direction of the relationship:
Positive Correlation: When two variables are said to exhibit a positive
correlation this implies that higher values of one variable are associated
with higher values of another variable. The closer the number is to +1,
the stronger the positive relationship.
Negative Correlation: When two variables are said to exhibit a negative
correlation, this implies that higher values of one variable are associated
with lower values of another variable. The closer the number is to -1,
the stronger the negative relationship.
Zero Correlation:
When two variables are said to exhibit zero
correlation this implies that higher values of one variable are associated
with all values of another variable and vice versa. In other words, no
association exits between the two variables.
Optimal Database Marketing Drozdenko & Drake, © 2002
63
Reducing Customer Data Via Correlation Analysis (Cont.)
Continuing with the example, the figure below displays the correlation
coefficients associated with various data elements versus the likelihood
of ordering “Pop Rock USA.”
Data Element
Total Promotions Sent Ever
Total Promotions Sent in the Past 36 Months
Total Products Paid Ever
Total Products Paid in the Past 36 Months
Total Dollars Paid Ever
Total Dollars Paid in the Past 36 Months
Total Number of Contemporary Music Albums Purchased Ever
Total Number of Cookbooks Purchased Ever
Total Number of Classical Music Products Purchased Ever
Household Annual Income
Individual Age
Optimal Database Marketing Drozdenko & Drake, © 2002
Correlation
Coefficient with
Ordering
0.01
0.02
0.12
0.17
0.13
0.18
0.19
0.10
-0.09
0.01
-0.13
P-Value
.2534
.1443
< .0001
< .0001
< .0001
< .0001
< .0001
.0004
.0436
.1274
< .0001
64
Reducing Customer Data Via Correlation Analysis (Cont.)
Also displayed are “p-values.” P-values tell you if the observed sample
correlation for each variable is significantly different from zero. The smaller the
p-value (< .1000) the more likely the data element is correlated with ordering.
Data Element
Total Promotions Sent Ever
Total Promotions Sent in the Past 36 Months
Total Products Paid Ever
Total Products Paid in the Past 36 Months
Total Dollars Paid Ever
Total Dollars Paid in the Past 36 Months
Total Number of Contemporary Music Albums Purchased Ever
Total Number of Cookbooks Purchased Ever
Total Number of Classical Music Products Purchased Ever
Household Annual Income
Individual Age
Optimal Database Marketing Drozdenko & Drake, © 2002
Correlation
Coefficient with
Ordering
0.01
0.02
0.12
0.17
0.13
0.18
0.19
0.10
-0.09
0.01
-0.13
P-Value
.2534
.1443
< .0001
< .0001
< .0001
< .0001
< .0001
.0004
.0436
.1274
< .0001
65
Reducing Customer Data Via Correlation Analysis (Cont.)
Data Element
Total Promotions Sent Ever
Total Promotions Sent in the Past 36 Months
Total Products Paid Ever
Total Products Paid in the Past 36 Months
Total Dollars Paid Ever
Total Dollars Paid in the Past 36 Months
Total Number of Contemporary Music Albums Purchased Ever
Total Number of Cookbooks Purchased Ever
Total Number of Classical Music Products Purchased Ever
Household Annual Income
Individual Age
Correlation
Coefficient with
Ordering
0.01
0.02
0.12
0.17
0.13
0.18
0.19
0.10
-0.09
0.01
-0.13
P-Value
.2534
.1443
< .0001
< .0001
< .0001
< .0001
< .0001
.0004
.0436
.1274
< .0001
Why does “total products paid in the past 36 months” have a slightly
higher positive correlation with ordering PRUSA than “total products
paid ever”?
Optimal Database Marketing Drozdenko & Drake, © 2002
66
Reducing Customer Data Via Correlation Analysis (Cont.)
Data Element
Total Promotions Sent Ever
Total Promotions Sent in the Past 36 Months
Total Products Paid Ever
Total Products Paid in the Past 36 Months
Total Dollars Paid Ever
Total Dollars Paid in the Past 36 Months
Total Number of Contemporary Music Albums Purchased Ever
Total Number of Cookbooks Purchased Ever
Total Number of Classical Music Products Purchased Ever
Household Annual Income
Individual Age
Correlation
Coefficient with
Ordering
0.01
0.02
0.12
0.17
0.13
0.18
0.19
0.10
-0.09
0.01
-0.13
P-Value
.2534
.1443
< .0001
< .0001
< .0001
< .0001
< .0001
.0004
.0436
.1274
< .0001
Why does “total number of contemporary music albums purchased ever”
have a much higher positive correlation with ordering PRUSA than “total
products paid ever”?
Optimal Database Marketing Drozdenko & Drake, © 2002
67
Reducing Customer Data Via Correlation Analysis (Cont.)
Data Element
Total Promotions Sent Ever
Total Promotions Sent in the Past 36 Months
Total Products Paid Ever
Total Products Paid in the Past 36 Months
Total Dollars Paid Ever
Total Dollars Paid in the Past 36 Months
Total Number of Contemporary Music Albums Purchased Ever
Total Number of Cookbooks Purchased Ever
Total Number of Classical Music Products Purchased Ever
Household Annual Income
Individual Age
Correlation
Coefficient with
Ordering
0.01
0.02
0.12
0.17
0.13
0.18
0.19
0.10
-0.09
0.01
-0.13
P-Value
.2534
.1443
< .0001
< .0001
< .0001
< .0001
< .0001
.0004
.0436
.1274
< .0001
Which data elements show no relationship (either positively or
negatively) with ordering PRUSA?
Optimal Database Marketing Drozdenko & Drake, © 2002
68
Statistical Background - Correlation Analysis
The strength of the linear relationship between two variables is
assessed by using correlation analysis.
For example, are age and income somehow related? In other words,
employ correlation analysis when interested in asking “Do older people
have higher incomes on average than younger people?”
Two variables can be positively correlated, negatively correlated or
exhibit no correlation.
Optimal Database Marketing Drozdenko & Drake, © 2002
69
Statistical Background - Correlation Analysis (Cont.)
Positive Correlation: Ten customers were selected at random from the
ACME Direct database. We note their ages and income levels via
enhancement data as shown below.
Age
Income
59
$92,000
32
$43,000
19
$27,000
22
$24,000
45
$41,000
55
$65,000
47
$60,000
36
$62,000
25
$41,000
51
$85,000
A scatter plot of this data created in Excel is displayed below.
Scatter Plot of Age vs. Income
$100,000
Income
$80,000
$60,000
$40,000
$20,000
$0
0
20
40
60
80
Age
Optimal Database Marketing Drozdenko & Drake, © 2002
70
Statistical Background - Correlation Analysis (Cont.)
Negative Correlation: Ten customers were selected at random from the
ACME Direct database. We note their total number of classical/opera
and country music CD packages purchased in the past as shown below.
Classical/Opera Purchases
Country Purchases
6
1
3
4
1
7
2
4
5
2
8
0
7
1
4
2
2
5
1
4
A scatter plot of this data created in Excel is displayed below.
Plot of Classical/Opera vs. Country Music
Purchases
Country
8
6
4
2
0
0
2
4
6
8
10
Classical/Opera
Optimal Database Marketing Drozdenko & Drake, © 2002
71
Statistical Background - Correlation Analysis (Cont.)
Zero Correlation: Ten customers are were randomly selected from the
ACME Direct database. We note their ages and total cookbook
purchases to date as shown below.
Age
Cookbook Purchases
21
3
33
5
76
7
45
2
61
5
23
7
66
3
55
3
51
6
42
5
A scatter plot of this data created in Excel is displayed below.
Cookbook Purchases
Plot of Age vs. Cookbook Purchases
8
6
4
2
0
0
20
40
60
80
Age
Optimal Database Marketing Drozdenko & Drake, © 2002
72
Statistical Background - Correlation Analysis (Cont.)
To determine the degree to which two variables are correlated (either positively or negatively) we
calculate what is called the correlation coefficient (denoted as r and called “rho.”). The correlation
coefficient r is a value between -1 and +1 that reflects both the strength and the direction of the
relationship.
The formula to determine the correlation coefficient follows:
r = ( SSxy ) / (  ( SSxx )(SSyy ) )
Where:
SSxy
SSxx
SSyy
= ( xy ) - ( x )( y )/n
2
2
= ( x ) - ( x ) /n
2
2
= ( y ) - ( y ) /n
and:
 means “the sum of”
Optimal Database Marketing Drozdenko & Drake, © 2002
73
Statistical Background - Correlation Analysis (Cont.)
Reconsidering our age and income example we first must calculate the
products and the sums of these products for the x and y data as shown
below.
Age (X)
59
32
19
22
45
55
47
36
25
51
 = 391
Income in
Thousands (Y)
92
43
27
24
41
65
60
62
41
85
 = 540
Optimal Database Marketing Drozdenko & Drake, © 2002
2
X
3,481
1,024
361
484
2,025
3,025
2,209
1,296
625
2,601
 = 17,131
2
Y
8,464
1,849
729
576
1,681
4,225
3,600
3,844
1,681
7,225
 = 33,874
XY
5,428
1,376
513
528
1,845
3,575
2,820
2,232
1,025
4,335
 = 23,677
74
Statistical Background - Correlation Analysis (Cont.)
Next, calculate SSxy, SSxx and SSyy:
SSxy
= ( xy ) - ( x )( y )/n
= (23,677) - (391)(540)/10
= (23,677) - (21,114)
= 2,563
SSxx
= ( x ) - ( x ) /n
= (17,131) - (391)(391)/10
= (17,131) - (15,288.1)
= 1,842.9
SSyy
= ( y ) - ( y ) /n
= (33,874) - (540)(540)/10
= (33,874) - (29,160)
= 4,714
2
2
2
2
Optimal Database Marketing Drozdenko & Drake, © 2002
75
Statistical Background - Correlation Analysis (Cont.)
And, calculate r:
r
= ( SSxy ) / (  ( SSxx )(SSyy ) )
= (2,563) / ( (1,842.9)(4,714))
= (2,563) / ( (8,687,430.6))
= (2,563) / (2,947.4)
= .87
Therefore, the correlation coefficient between income and age is +.87. This value is very close to
+1, thus indicating a very strong positive relationship between these two variables exists.
Optimal Database Marketing Drozdenko & Drake, © 2002
76
Running Correlation Analysis in Excel
We will now do an example of how to run a correlation analysis in Excel.
Optimal Database Marketing Drozdenko & Drake, © 2002
77
Running Correlation Analysis in Excel (Cont.)
First select “Tools” then “Data Analysis.” Select “Correlation” in the list
of options and click “OK.”
Optimal Database Marketing Drozdenko & Drake, © 2002
78
Running Correlation Analysis in Excel (Cont.)
Next, highlight the input data and select the area to anchor the upper left
hand corner of the output to be generated. Click “OK.”
Optimal Database Marketing Drozdenko & Drake, © 2002
79
Running Correlation Analysis in Excel (Cont.)
The resulting correlation will appear.
Optimal Database Marketing Drozdenko & Drake, © 2002
80
Running Correlation Analysis in Excel (Cont.)
NOTE: If you cannot find the “Data Analysis” option on the “Tools”
menu, you will need to load the Analysis ToolPak feature. You can do
this by selecting the “Add-Ins” feature on the “Tools” menu and ticking
the “Analysis ToolPak.”
You may need the original CD ROM used when loading Excel for this to
work.
Optimal Database Marketing Drozdenko & Drake, © 2002
81
Download