Probability Models in Marketing 2

advertisement
Web Site Example
• Web site for clothing catalogue company
• Company has customer data on purchases from
site, but wants to know more about all visitors to
their web site
• Buys web panel data
– from Nielsen//NetRatings or Media Metrix (not in NZ)
• E.g. Nielsen//NetRatings universe for the At Home Internet
audience measurement is all individuals aged 2+ living in
homes that have access to the Internet via a PC owned or
leased by a household member and using a Windows
operating system
Respondent Data
ID
# of Visits
Income
Sex
Age
HH Size
1
0
$87,500
1
48
2
2
5
$17,500
1
57
1
3
0
$65,000
0
28
2
4
0
$55,000
1
52
3
5
0
$55,000
1
17
3
6
0
$55,000
0
19
3
7
0
$72,500
0
39
2
8
1
$125,000
0
59
2
9
0
$22,500
0
70
1
10
0
$55,000
0
47
3
Frequency Distribution
Number of Visits
Frequency Count
0
2046
1
318
2
129
3
66
4
38
5
30
6
16
7
11
8
9
9
10
10+
55
Fit Poisson Model
• R Code:
visit.dist <- c(2046,318,129,66,38,30,16,11,9,10,55)
lpois <- function(lambda,data) {
visits <- 0:9
sum(data[1:10]*log(dpois(visits,lambda))) +
data[11]*log(ppois(9,lambda,lower.tail=FALSE))
}
optimise(function(param){lpois(param,visit.dist)},c(0,10))
• Result: maximum value of log-likelihood is
achieved at λ=0.72
Simple Poisson Model
Fit of the Poisson Model
Expected # People
2500
2000
1500
Obs
Exp
1000
500
0
0
1
2
3
4
5
6
# of Visits
7
8
9
10+
Nature of Heterogeneity
• Unobserved (or random) heterogeneity
– The visiting rate λ is assumed to vary across the
population according to some distribution
– No attempt is made to explain why people differ in
their visiting rates
• Observed (or determined) heterogeneity
– Explanatory variables are observed for each person
– We explicitly link the value of λ for each person to
their values of the explanatory variables
• E.g. Poisson regression model
Poisson Regression Model
• Let Yi be the number of times that individual i visits the
web site
• Assume Yi is distributed as a Poisson random variable
with mean λi
• Suppose each individual’s mean λi is related to their
observed explanatory characteristics by
i  e  x
i
or equivalent ly
ln(  )   xi
• Take logs of household income and age first
• R code, using glm function for Poisson regression:
glm.siteVisits <- glm(Visits ~ logHouseholdIncome + Sex + logAge
+ HH.Size, family=poisson(), data=siteVisits)
summary(glm.siteVisits)
Poisson Regression Estimates
Coefficient
Std Error
t value
-3.122
0.405
-7.7
log(income)
0.093
0.034
2.7
sex
0.004
0.041
0.1
log(age)
0.589
0.055
10.8
Hhld size
-0.036
0.015
-2.3
Intercept
LL (Pois. reg.) (A)
-6291.5
LL (Pois.) (B)
-6378.5
LR
174 (df=4)
Can also fit model using maximum likelihood as for simple
Poisson model, but this will not give standard errors
Poisson vs Poisson Regression
• The simple Poisson model (model B) is
nested within the Poisson regression
model (model A)
• So we can use a likelihood ratio test to see
whether model A fits the data better
• Compute the test statistic
LR  2LLB  LLA 
and reject the null hypothesis of no
2
difference if LR   0.05,df
Expected Number of Visits
Person 1
Person 2
$87,500
$55,000
Sex
1
0
Age
48
19
Hhld size
2
3
Income
1  exp  3.126  0.094 ln 87500  0.004  0.588 ln 48  0.036  2
 1.164
2  exp  3.126  0.094 ln 55000  0.588 ln 19  0.036  3
 0.621
• So person 2 should visit the site less often than person 1
Poisson Regression Model
Fit of Poisson Regression
Expected # of People
2500
Obs
2000
Exp
1500
1000
500
0
0
1
2
3
4
5
6
7
Number of Visits
8
9
10+
Poisson Regression Fit
• Poisson regression model improves fit over
simple Poisson model
– But not by much
• Try introducing random heterogeneity instead of,
or as well as, observed heterogeneity
• Possibilities include:
–
–
–
–
Zero-inflated Poisson model
Zero-inflated Poisson regression
Negative binomial distribution
Negative binomial regression
Zero-inflated Poisson Regression
• Assume that a proportion π of people
never visit the site
• However other people visit according to
Poisson model
• Probability distribution:
P( Yi  y )   0 y   1   
ye
y!
Zero-inflated Poisson Model
• Note that Poisson model predicts too few zeros
• Assume that a proportion π of people never visit the site
– Remaining people visit according to Poisson distribution
• No deterministic component
• R code:
lzipois <- function(pi,lambda,data) {
visits <- 1:9
data[1]*log(pi + (1-pi)*dpois(0,lambda)) +
sum(data[2:10]*log((1-pi)*dpois(visits,lambda))) +
data[11]*log((1-pi)*ppois(9,lambda,lower.tail=FALSE))
}
optim(c(0.5,1),function(param){lzipois(param[1],param[2],visit.dist)})
• Likelihood maximised at π=0.73, λ=2.71
Zero-Inflated Poisson Model
Fit of the Zero-Inflated Poisson Model
Expected # People
2500
2000
1500
Obs
Exp
1000
500
0
0
1
2
3
4
5
6
7
Number of Visits
8
9
10+
Zero-inflated Poisson Regression
• Can add deterministic heterogeneity to
zero-inflated Poisson (ZIP) model
• Again assume that a proportion π of
people never visit the site
• However other people visit according to
Poisson regression model
• Probability distribution:
 x y  e

e e




P (Yi  y )  0 y  1  
 xi
i
y!
Fit ZIP Regression Model
• R code:
siteVisits <- read.csv(“visits.csv”)
lzipreg <- function(param,data) {
zpi <- param[1]
lambda <- exp(param[2] + data[,3:6] %*% param[3:6])
sum(log(ifelse(data[,2] == 0,zpi,0) +
(1-zpi)*dpois(data[,2],lambda)))
}
optim(c(.7,2,0,-0.1,0.1,0),function(param){lzipreg(param,as.matrix(siteVisits))},control=list(maxit=1000))
• Likelihood maximised at π=0.74, β=(1.90,-0.09,-0.13,
0.11,0.02)
ZIP Regression Predictions
Fit of ZIP Regression Model
Expected # People
2500
Obs
2000
Exp
1500
1000
500
0
0
1
2
3
4
5
6
7
Number of Visits
8
9
10+
Simple NBD Model
• Recall the negative binomial distribution
– The number of visits Y made by each individual has a
Poisson distribution with rate λ
– λ has a Gamma distribution across the population
   1  
g   
 e ,  0
 
– At the population level, the number of visits has a
negative binomial distribution
  y    


PY  y  
  y!    1 

 1 


1  
y
Fitting NBD Model
• R code:
lnbd2 <- function(alpha,beta,data) {
visits <- 0:9
prob <- beta/(beta+1)
sum(data[1:10]*log(dnbinom(visits,alpha,prob))) +
data[11]*log(1-pnbinom(9,alpha,prob))
}
optim(c(1,1),function(param) {lnbd2(param[1],param[2],visit.dist)})
• Likelihood maximised for α=0.157 and β=0.197
NBD Model Predictions
Fit of the NBD Model
Expected # People
2500
Obs
2000
Exp
1500
1000
500
0
0
1
2
3
4
5
6
7
Number of Visits
8
9
10+
NBD Regression
• Can also add deterministic heterogeneity to NBD
model
• Again assume that a proportion π of people
never visit the site
• However other people visit according to an NBD
regression model
• Probability distribution:
y

g x

  y      e




P(Yi  y ) 
g x  
g x 
  y!    e     e 
i
i
i
• Reduces to simple NBD model when g=0
NBD Regression Estimates
Coefficient
Std Error
t value
-4.047
1.102
-3.7
Theta (α)
0.139
0.007
19.1
log(income)
0.075
0.096
0.8
-0.005
0.116
-0.0
log(age)
0.890
0.141
6.3
Hhld size
-0.025
0.042
-0.6
Intercept (β/2)
Sex
Can also fit model using maximum likelihood, but this will
not give standard errors
NBD Regression Fit
Fit of NBD Regression
Expected # People
2500
Obs
2000
Exp
1500
1000
500
0
0
1
2
3
4
5
6
7
Number of Visits
8
9
10+
Covariates In General
• Choose a probability distribution that fits the
individual-level outcome variable
– This has parameters (a.k.a. latent traits) θi
• Think of the individual-level latent traits θi as a
function of covariates x
• Incorporate a mixing distribution to capture the
remaining heterogeneity in the θi
– The variation in θi not explained by x
• Fit this model (e.g. using maximum likelihood)
New Concepts
• How to incorporate covariates in
probability models
– Poisson, zero-inflated Poisson and NBD
regression models for count data
• However, getting the outcome variable
distribution right was more crucial here
than introducing covariates
• Importance of covariates is often
exaggerated
Reach and Frequency Models
• Advertising is a major industry
– NZ Ad expenditure reached $1.5bn in 2000
– Many companies spend millions each year
• Crucial to understand the effects of this
expenditure
• Major outcomes include how many people are
reached by an ad campaign, and how many
times
– Known as reach and frequency (R&F)
– Typically analysis is limited to calculating media
exposure, not advertising exposure
Reach and Frequency Models
• Data on TV viewing, newspaper and magazine,
radio listening etc is routinely gathered
– Ratings and readership figures determine the price of
space in these media
• However this data typically does not enable
detailed reach and frequency analysis
– E.g. readership questions ask about the last issue
read, and how many read out of average 4 issues
– Longitudinal data is collected on TV viewing, but item
non-response causes problems with direct analysis
• Models are needed to derive complete reach
and frequency analyses from the collected data
R&F Analysis Examples
R&F Analysis Examples
Beta-Binomial Model for R&F
• If an advertiser has placed an ad in each of 10 issues of
a magazine, the beta-binomial model assumes that:
– Each person has a probability p of reading each issue
– These probabilities follow a beta distribution
1
 1
g ( p) 
p 1 1  p 
B ,  
– Each issue is read independently, between and across
individuals
• Distribution of # issues read for each person is binomial
• The resulting aggregate exposure distribution is the
beta-binomial
• Applied to R&F analysis by Metheringham (1964)
• Still widely used
• But not very accurate
Typical Exposure Distribution
10000
1000
0
1
2
3
4
100
10
1
Number of Issues Read
Modified BBM
• One problem with the beta-binomial model
is that it does not model loyal
viewers/readers/listeners well
• By adding a point mass at 1 to the beta
distribution of exposure probabilities, the
BBM can be modified to accommodate
loyal readers etc
– Derived by Chandon (1976); improved by
Danaher (1988), Austral. J. Statist.
Multiple Media Vehicles
• The BBM (and modified BBM) focus on
exposure to one media vehicle (e.g. one
magazine) over the course of an ad campaign
• Need to extend to multiple vehicles
– Model both reading choice and times read, in one
combined model
• Could assume independence
– E.g. Dirichlet-multinomial model
• Assumes independence of irrelevant alternatives (IIA)
– But there are known to be correlations between
different media vehicles
• E.g. women’s magazines, business papers, programmes on
TV1 vs TV3
Multiple Media Vehicles
• Models need to take correlations between media
vehicles into account
• Log-linear models have been used
– But these are computationally intensive for moderately large
advertising schedules
• Canonical expansion model (Danaher 1992)
– Uses Goodhardt and Ehrenburg’s “duplication of viewing” law to
minimise need for multivariate correlations
• Data on pairwise correlations used, but higher order joint
probabilities are derived using this law
– Higher order interactions are assumed to be zero
• Canonical expansions are used for the joint probabilities to minimise
computations
FMCG Sales/Purchasing
• Retail sales figures for fast moving consumer goods
– Have good aggregate weekly sales figures
• Data available down to SKU level
• Data collected at store level
– Know when total sales are changing over time
– Can also investigate overall response to promotions
• Using store level data can give more accurate results, and even
allow some segmentation by chain or region
– However sales figures cannot show us who is buying more when
sales increase, or who is affected by promotions
• Heavy buyers? Light buyers? New buyers?
• Households with kids? Retired couples? Flatters?
• Even when overall sales are flat, there may be hidden changes
– Marketing activities could be made more effective using this sort
of information, so how can we find out about this?
Household Purchasing Data
• Data about FMCG purchases collected from a
panel of households
– Can be collected through diaries
• Or even weekly interviews, based on recall
– Best method is currently to equip panel with scanners
• This is used by each household member to record all items
bought
• ACNielsen (NZ) runs a scanner panel of over 1000
households
– Data includes amount purchased, price, date, product
details down to SKU level
– Also have demographic characteristics of household
Common Research Questions
• Who buys my product?
– Perhaps better answered by U&A (usage and attitudes) study
• How much do they buy?
• How often?
• Who are my heavy buyers? Light buyers? Frequent
buyers?
• How many are repeat buyers?
• How does this compare to my other brands? How about
my competitors?
• Are my results normal?
– How do they compare to similar products in other categories?
Example of Purchase Data
Purchases in Week Number:
1
2
3
4
5
9
10
Household 1
-
A -
-
A A B A -
-
Household 2
A -
-
Household 3
-
-
B -
Household 4
-
-
-
…
.
.
.
A -
6
7
8
11
12
13
A -
B -
-
B -
-
-
-
-
-
-
-
-
-
A -
-
-
-
-
-
-
-
-
-
-
-
.
.
.
.
.
.
.
.
.
.
Results for 4-Week Months
Purchases by Month
1
2
3
…
Total
Household 1
1A
3A,1B
1A
.
5A,1B
Household 2
2A
1B
1B
.
2A,2B
Household 3
1B
-
1A
.
1A,1B
Household 4
-
-
-
.
-
3A,1B
3A,2B
2A,1B
.
8A,4B
Total
Observations
• Usually there will be a wide range of
purchasing intensity among buyers of
each brand
– Also a proportion who do not buy the brand
• Instead of a whole brand, we can also look
at a brand/package size combination
– Similar findings apply at both levels
Another Example
• Data gathered from a panel of 983 households
• Purchases of Lux Flakes over a 12 week period
– Various summary measures shown below
# of Purchase
Occasions
1 2 3 4+
Total
17 3 2 0
17 6 6 0
22
29
100% 1.3
100%
Bought in Last # Repeat Buyers
12 Weeks
# Units Bought
6
6
2 1 0
4 3 0
9
13
41% 1.4
45%
Didn’t Buy in
# “New” Buyers
Last 12 weeks # Units Bought
11 1 1 0
11 2 3 0
13
16
59% 1.2
55%
Cumulative purchases from at
least this # of purchase occasions
29 12 6 0
-
All Buyers
# Buyers
# Units Bought
%
Average # of
Purchases
per Buyer
-
Example (continued)
• Low penetration overall
– 22 buyers, about 2% of panel
• More than half the purchases were by
“new” buyers
• The cumulative purchasing distribution
looks similar to the cumulative reach
distributions from the last lecture
Negative Binomial Model
• Fit NBD model – assumes Poisson process for purchase
occasions, with Gamma heterogeneity
• R code:
purchase.dist <- c(961,17,3,2)
lnbd3 <- function(alpha,beta,data) {
visits <- 0:3
prob <- beta/(beta+1)
sum(data[1:4]*log(dnbinom(visits,alpha,prob)))
}
optim(c(1,1),function(param) {lnbd3(param[1],param[2],purchase.dist)})
• Likelihood maximised for α=0.045 and β=1.514
Negative Binomial Model
• Can also fit the model based on the
observed values of two quantities
– The proportion of people p0 making no
purchases during the study period
– The mean number of purchases made m
(assuming that only one item is purchased at
each purchase occasion)
• Then solve for α and β numerically
Multivariate NBD
• Generalise to multiple time periods with
durations Ti, i=1,…,t
• Various partitionings of the Ti lead to variables
that are also NBD
– E.g. divide into the first s time periods and the
remaining t - s
– The values for the latter t - s periods, conditional on
those for the first s, are multivariate NBD
• α is incremented by the total purchases from the first s
periods, and the mean is updated as a weighted average of
the original mean and the observed mean.
• So can easily apply empirical Bayes techniques using this
model
NBD Model for Longer Periods
• Another property of the NBD is that
purchases over a longer time period are
also NBD (assuming that the purchasing
process remains the same)
• The mean number of purchases increases
in proportion to the length of the period
• But the parameter α remains fixed
NBD Model
• The NBD model has been applied to products in
a wide range of categories
• It generally fits very well
• The main exception (for diary data) is when the
recording period is too short compared to the
purchase frequency
– Often people record shopping once in each period,
rather than multiple times
– Can cause problems if many people are expected to
purchase once or more each period
α is Usually Constant
• Typically α will be relatively constant
across different products in the same
category
• This means that the heterogeneity in
purchasing rates is similar across products
• However β will vary to reflect the
penetrations of the different products
Multiple Brands
• So far have only looked at the NBD for
purchases of a single brand
• Want to model multiple brands
• Will use a combined model for brand
choice and for number of purchases
– The NBD-Dirichlet distribution
Model for Brand Choice
• Assume that brand choices are made
independently for each purchase, with a
individual i having a fixed probability pij of
choosing brand j
• These probabilities pij are assumed to vary
among people according to a Dirichlet
distribution
– This is a generalisation of the Beta distribution
The Dirichlet Distribution
• Recall that the Beta distribution has pdf
1
 1
p 1 1  p 
B ,  
1   2  1 1  2 1

1  2
1  2 
g ( p) 
setting θ1=p, θ2=(1-p), α1=α and α2=β
• The Dirichlet distribution generalises this
to k dimensions
Brand Choice
• These assumptions mean that the joint
distribution of brands purchased, across
all consumers, is a mixture of multinomials
with a Dirichlet distribution
– i.e. a Dirichlet-multinomial distribution
– For two brands, this is just the familiar Betabinomial distribution
Purchasing Model
• We now turn to the purchasing process
• Assume that purchases made by individual i
occur randomly and independently with mean
rate λi, resulting in a Poisson process
• Also assume that the means vary across the
population according to a Gamma distribution
• This means that the number of purchases will
follow a negative binomial distribution
Combined Model
• Finally, assume that the purchase rates
and brand choice probabilities are
independent of one another
• The resulting distribution of brand
purchases follows an NBD-Dirichlet
distribution (often called Dirichlet for short)
– This has k+2 parameters, one for each brand
and 2 for the NBD purchase distribution
Discussion
• This model has been found to describe many
aspects of buyer behaviour well, across a wide
range of situations
• It assumes that buying behaviour is stationary,
i.e. is not showing any trends over time
• The Dirichlet distribution assumes that the
probabilities are independent apart from the
constraint that they sum to 1
– In marketing terms, this means that the market is not
segmented
• The proportion of purchases that go to one brand is
independent of how the remaining purchases are spread
across the other brands
Discussion (continued)
• One implication of this is an additivity
property
– Any two brands can be combined into a
“super-brand” with expected purchases equal
to the sum of the individual brand means
– The rest of the model is unaffected by this
change
• The NBD-Dirichlet has also been applied
to pack sizes, stores, TV programmes, etc
Example of Model Fit
A Single Brand
• The NBD-Dirichlet model does not give exactly
an NBD distribution for a single brand
– However in practice the difference is minimal
Duplication Between Brands
• Generally constant down columns
– Reflects unsegmented nature of most markets
Duplication Law
• The duplication between two brands is usually
proportional to the product of their penetrations
Download