Web Site Example • Web site for clothing catalogue company • Company has customer data on purchases from site, but wants to know more about all visitors to their web site • Buys web panel data – from Nielsen//NetRatings or Media Metrix (not in NZ) • E.g. Nielsen//NetRatings universe for the At Home Internet audience measurement is all individuals aged 2+ living in homes that have access to the Internet via a PC owned or leased by a household member and using a Windows operating system Respondent Data ID # of Visits Income Sex Age HH Size 1 0 $87,500 1 48 2 2 5 $17,500 1 57 1 3 0 $65,000 0 28 2 4 0 $55,000 1 52 3 5 0 $55,000 1 17 3 6 0 $55,000 0 19 3 7 0 $72,500 0 39 2 8 1 $125,000 0 59 2 9 0 $22,500 0 70 1 10 0 $55,000 0 47 3 Frequency Distribution Number of Visits Frequency Count 0 2046 1 318 2 129 3 66 4 38 5 30 6 16 7 11 8 9 9 10 10+ 55 Fit Poisson Model • R Code: visit.dist <- c(2046,318,129,66,38,30,16,11,9,10,55) lpois <- function(lambda,data) { visits <- 0:9 sum(data[1:10]*log(dpois(visits,lambda))) + data[11]*log(ppois(9,lambda,lower.tail=FALSE)) } optimise(function(param){lpois(param,visit.dist)},c(0,10)) • Result: maximum value of log-likelihood is achieved at λ=0.72 Simple Poisson Model Fit of the Poisson Model Expected # People 2500 2000 1500 Obs Exp 1000 500 0 0 1 2 3 4 5 6 # of Visits 7 8 9 10+ Nature of Heterogeneity • Unobserved (or random) heterogeneity – The visiting rate λ is assumed to vary across the population according to some distribution – No attempt is made to explain why people differ in their visiting rates • Observed (or determined) heterogeneity – Explanatory variables are observed for each person – We explicitly link the value of λ for each person to their values of the explanatory variables • E.g. Poisson regression model Poisson Regression Model • Let Yi be the number of times that individual i visits the web site • Assume Yi is distributed as a Poisson random variable with mean λi • Suppose each individual’s mean λi is related to their observed explanatory characteristics by i e x i or equivalent ly ln( ) xi • Take logs of household income and age first • R code, using glm function for Poisson regression: glm.siteVisits <- glm(Visits ~ logHouseholdIncome + Sex + logAge + HH.Size, family=poisson(), data=siteVisits) summary(glm.siteVisits) Poisson Regression Estimates Coefficient Std Error t value -3.122 0.405 -7.7 log(income) 0.093 0.034 2.7 sex 0.004 0.041 0.1 log(age) 0.589 0.055 10.8 Hhld size -0.036 0.015 -2.3 Intercept LL (Pois. reg.) (A) -6291.5 LL (Pois.) (B) -6378.5 LR 174 (df=4) Can also fit model using maximum likelihood as for simple Poisson model, but this will not give standard errors Poisson vs Poisson Regression • The simple Poisson model (model B) is nested within the Poisson regression model (model A) • So we can use a likelihood ratio test to see whether model A fits the data better • Compute the test statistic LR 2LLB LLA and reject the null hypothesis of no 2 difference if LR 0.05,df Expected Number of Visits Person 1 Person 2 $87,500 $55,000 Sex 1 0 Age 48 19 Hhld size 2 3 Income 1 exp 3.126 0.094 ln 87500 0.004 0.588 ln 48 0.036 2 1.164 2 exp 3.126 0.094 ln 55000 0.588 ln 19 0.036 3 0.621 • So person 2 should visit the site less often than person 1 Poisson Regression Model Fit of Poisson Regression Expected # of People 2500 Obs 2000 Exp 1500 1000 500 0 0 1 2 3 4 5 6 7 Number of Visits 8 9 10+ Poisson Regression Fit • Poisson regression model improves fit over simple Poisson model – But not by much • Try introducing random heterogeneity instead of, or as well as, observed heterogeneity • Possibilities include: – – – – Zero-inflated Poisson model Zero-inflated Poisson regression Negative binomial distribution Negative binomial regression Zero-inflated Poisson Regression • Assume that a proportion π of people never visit the site • However other people visit according to Poisson model • Probability distribution: P( Yi y ) 0 y 1 ye y! Zero-inflated Poisson Model • Note that Poisson model predicts too few zeros • Assume that a proportion π of people never visit the site – Remaining people visit according to Poisson distribution • No deterministic component • R code: lzipois <- function(pi,lambda,data) { visits <- 1:9 data[1]*log(pi + (1-pi)*dpois(0,lambda)) + sum(data[2:10]*log((1-pi)*dpois(visits,lambda))) + data[11]*log((1-pi)*ppois(9,lambda,lower.tail=FALSE)) } optim(c(0.5,1),function(param){lzipois(param[1],param[2],visit.dist)}) • Likelihood maximised at π=0.73, λ=2.71 Zero-Inflated Poisson Model Fit of the Zero-Inflated Poisson Model Expected # People 2500 2000 1500 Obs Exp 1000 500 0 0 1 2 3 4 5 6 7 Number of Visits 8 9 10+ Zero-inflated Poisson Regression • Can add deterministic heterogeneity to zero-inflated Poisson (ZIP) model • Again assume that a proportion π of people never visit the site • However other people visit according to Poisson regression model • Probability distribution: x y e e e P (Yi y ) 0 y 1 xi i y! Fit ZIP Regression Model • R code: siteVisits <- read.csv(“visits.csv”) lzipreg <- function(param,data) { zpi <- param[1] lambda <- exp(param[2] + data[,3:6] %*% param[3:6]) sum(log(ifelse(data[,2] == 0,zpi,0) + (1-zpi)*dpois(data[,2],lambda))) } optim(c(.7,2,0,-0.1,0.1,0),function(param){lzipreg(param,as.matrix(siteVisits))},control=list(maxit=1000)) • Likelihood maximised at π=0.74, β=(1.90,-0.09,-0.13, 0.11,0.02) ZIP Regression Predictions Fit of ZIP Regression Model Expected # People 2500 Obs 2000 Exp 1500 1000 500 0 0 1 2 3 4 5 6 7 Number of Visits 8 9 10+ Simple NBD Model • Recall the negative binomial distribution – The number of visits Y made by each individual has a Poisson distribution with rate λ – λ has a Gamma distribution across the population 1 g e , 0 – At the population level, the number of visits has a negative binomial distribution y PY y y! 1 1 1 y Fitting NBD Model • R code: lnbd2 <- function(alpha,beta,data) { visits <- 0:9 prob <- beta/(beta+1) sum(data[1:10]*log(dnbinom(visits,alpha,prob))) + data[11]*log(1-pnbinom(9,alpha,prob)) } optim(c(1,1),function(param) {lnbd2(param[1],param[2],visit.dist)}) • Likelihood maximised for α=0.157 and β=0.197 NBD Model Predictions Fit of the NBD Model Expected # People 2500 Obs 2000 Exp 1500 1000 500 0 0 1 2 3 4 5 6 7 Number of Visits 8 9 10+ NBD Regression • Can also add deterministic heterogeneity to NBD model • Again assume that a proportion π of people never visit the site • However other people visit according to an NBD regression model • Probability distribution: y g x y e P(Yi y ) g x g x y! e e i i i • Reduces to simple NBD model when g=0 NBD Regression Estimates Coefficient Std Error t value -4.047 1.102 -3.7 Theta (α) 0.139 0.007 19.1 log(income) 0.075 0.096 0.8 -0.005 0.116 -0.0 log(age) 0.890 0.141 6.3 Hhld size -0.025 0.042 -0.6 Intercept (β/2) Sex Can also fit model using maximum likelihood, but this will not give standard errors NBD Regression Fit Fit of NBD Regression Expected # People 2500 Obs 2000 Exp 1500 1000 500 0 0 1 2 3 4 5 6 7 Number of Visits 8 9 10+ Covariates In General • Choose a probability distribution that fits the individual-level outcome variable – This has parameters (a.k.a. latent traits) θi • Think of the individual-level latent traits θi as a function of covariates x • Incorporate a mixing distribution to capture the remaining heterogeneity in the θi – The variation in θi not explained by x • Fit this model (e.g. using maximum likelihood) New Concepts • How to incorporate covariates in probability models – Poisson, zero-inflated Poisson and NBD regression models for count data • However, getting the outcome variable distribution right was more crucial here than introducing covariates • Importance of covariates is often exaggerated Reach and Frequency Models • Advertising is a major industry – NZ Ad expenditure reached $1.5bn in 2000 – Many companies spend millions each year • Crucial to understand the effects of this expenditure • Major outcomes include how many people are reached by an ad campaign, and how many times – Known as reach and frequency (R&F) – Typically analysis is limited to calculating media exposure, not advertising exposure Reach and Frequency Models • Data on TV viewing, newspaper and magazine, radio listening etc is routinely gathered – Ratings and readership figures determine the price of space in these media • However this data typically does not enable detailed reach and frequency analysis – E.g. readership questions ask about the last issue read, and how many read out of average 4 issues – Longitudinal data is collected on TV viewing, but item non-response causes problems with direct analysis • Models are needed to derive complete reach and frequency analyses from the collected data R&F Analysis Examples R&F Analysis Examples Beta-Binomial Model for R&F • If an advertiser has placed an ad in each of 10 issues of a magazine, the beta-binomial model assumes that: – Each person has a probability p of reading each issue – These probabilities follow a beta distribution 1 1 g ( p) p 1 1 p B , – Each issue is read independently, between and across individuals • Distribution of # issues read for each person is binomial • The resulting aggregate exposure distribution is the beta-binomial • Applied to R&F analysis by Metheringham (1964) • Still widely used • But not very accurate Typical Exposure Distribution 10000 1000 0 1 2 3 4 100 10 1 Number of Issues Read Modified BBM • One problem with the beta-binomial model is that it does not model loyal viewers/readers/listeners well • By adding a point mass at 1 to the beta distribution of exposure probabilities, the BBM can be modified to accommodate loyal readers etc – Derived by Chandon (1976); improved by Danaher (1988), Austral. J. Statist. Multiple Media Vehicles • The BBM (and modified BBM) focus on exposure to one media vehicle (e.g. one magazine) over the course of an ad campaign • Need to extend to multiple vehicles – Model both reading choice and times read, in one combined model • Could assume independence – E.g. Dirichlet-multinomial model • Assumes independence of irrelevant alternatives (IIA) – But there are known to be correlations between different media vehicles • E.g. women’s magazines, business papers, programmes on TV1 vs TV3 Multiple Media Vehicles • Models need to take correlations between media vehicles into account • Log-linear models have been used – But these are computationally intensive for moderately large advertising schedules • Canonical expansion model (Danaher 1992) – Uses Goodhardt and Ehrenburg’s “duplication of viewing” law to minimise need for multivariate correlations • Data on pairwise correlations used, but higher order joint probabilities are derived using this law – Higher order interactions are assumed to be zero • Canonical expansions are used for the joint probabilities to minimise computations FMCG Sales/Purchasing • Retail sales figures for fast moving consumer goods – Have good aggregate weekly sales figures • Data available down to SKU level • Data collected at store level – Know when total sales are changing over time – Can also investigate overall response to promotions • Using store level data can give more accurate results, and even allow some segmentation by chain or region – However sales figures cannot show us who is buying more when sales increase, or who is affected by promotions • Heavy buyers? Light buyers? New buyers? • Households with kids? Retired couples? Flatters? • Even when overall sales are flat, there may be hidden changes – Marketing activities could be made more effective using this sort of information, so how can we find out about this? Household Purchasing Data • Data about FMCG purchases collected from a panel of households – Can be collected through diaries • Or even weekly interviews, based on recall – Best method is currently to equip panel with scanners • This is used by each household member to record all items bought • ACNielsen (NZ) runs a scanner panel of over 1000 households – Data includes amount purchased, price, date, product details down to SKU level – Also have demographic characteristics of household Common Research Questions • Who buys my product? – Perhaps better answered by U&A (usage and attitudes) study • How much do they buy? • How often? • Who are my heavy buyers? Light buyers? Frequent buyers? • How many are repeat buyers? • How does this compare to my other brands? How about my competitors? • Are my results normal? – How do they compare to similar products in other categories? Example of Purchase Data Purchases in Week Number: 1 2 3 4 5 9 10 Household 1 - A - - A A B A - - Household 2 A - - Household 3 - - B - Household 4 - - - … . . . A - 6 7 8 11 12 13 A - B - - B - - - - - - - - - - A - - - - - - - - - - - - . . . . . . . . . . Results for 4-Week Months Purchases by Month 1 2 3 … Total Household 1 1A 3A,1B 1A . 5A,1B Household 2 2A 1B 1B . 2A,2B Household 3 1B - 1A . 1A,1B Household 4 - - - . - 3A,1B 3A,2B 2A,1B . 8A,4B Total Observations • Usually there will be a wide range of purchasing intensity among buyers of each brand – Also a proportion who do not buy the brand • Instead of a whole brand, we can also look at a brand/package size combination – Similar findings apply at both levels Another Example • Data gathered from a panel of 983 households • Purchases of Lux Flakes over a 12 week period – Various summary measures shown below # of Purchase Occasions 1 2 3 4+ Total 17 3 2 0 17 6 6 0 22 29 100% 1.3 100% Bought in Last # Repeat Buyers 12 Weeks # Units Bought 6 6 2 1 0 4 3 0 9 13 41% 1.4 45% Didn’t Buy in # “New” Buyers Last 12 weeks # Units Bought 11 1 1 0 11 2 3 0 13 16 59% 1.2 55% Cumulative purchases from at least this # of purchase occasions 29 12 6 0 - All Buyers # Buyers # Units Bought % Average # of Purchases per Buyer - Example (continued) • Low penetration overall – 22 buyers, about 2% of panel • More than half the purchases were by “new” buyers • The cumulative purchasing distribution looks similar to the cumulative reach distributions from the last lecture Negative Binomial Model • Fit NBD model – assumes Poisson process for purchase occasions, with Gamma heterogeneity • R code: purchase.dist <- c(961,17,3,2) lnbd3 <- function(alpha,beta,data) { visits <- 0:3 prob <- beta/(beta+1) sum(data[1:4]*log(dnbinom(visits,alpha,prob))) } optim(c(1,1),function(param) {lnbd3(param[1],param[2],purchase.dist)}) • Likelihood maximised for α=0.045 and β=1.514 Negative Binomial Model • Can also fit the model based on the observed values of two quantities – The proportion of people p0 making no purchases during the study period – The mean number of purchases made m (assuming that only one item is purchased at each purchase occasion) • Then solve for α and β numerically Multivariate NBD • Generalise to multiple time periods with durations Ti, i=1,…,t • Various partitionings of the Ti lead to variables that are also NBD – E.g. divide into the first s time periods and the remaining t - s – The values for the latter t - s periods, conditional on those for the first s, are multivariate NBD • α is incremented by the total purchases from the first s periods, and the mean is updated as a weighted average of the original mean and the observed mean. • So can easily apply empirical Bayes techniques using this model NBD Model for Longer Periods • Another property of the NBD is that purchases over a longer time period are also NBD (assuming that the purchasing process remains the same) • The mean number of purchases increases in proportion to the length of the period • But the parameter α remains fixed NBD Model • The NBD model has been applied to products in a wide range of categories • It generally fits very well • The main exception (for diary data) is when the recording period is too short compared to the purchase frequency – Often people record shopping once in each period, rather than multiple times – Can cause problems if many people are expected to purchase once or more each period α is Usually Constant • Typically α will be relatively constant across different products in the same category • This means that the heterogeneity in purchasing rates is similar across products • However β will vary to reflect the penetrations of the different products Multiple Brands • So far have only looked at the NBD for purchases of a single brand • Want to model multiple brands • Will use a combined model for brand choice and for number of purchases – The NBD-Dirichlet distribution Model for Brand Choice • Assume that brand choices are made independently for each purchase, with a individual i having a fixed probability pij of choosing brand j • These probabilities pij are assumed to vary among people according to a Dirichlet distribution – This is a generalisation of the Beta distribution The Dirichlet Distribution • Recall that the Beta distribution has pdf 1 1 p 1 1 p B , 1 2 1 1 2 1 1 2 1 2 g ( p) setting θ1=p, θ2=(1-p), α1=α and α2=β • The Dirichlet distribution generalises this to k dimensions Brand Choice • These assumptions mean that the joint distribution of brands purchased, across all consumers, is a mixture of multinomials with a Dirichlet distribution – i.e. a Dirichlet-multinomial distribution – For two brands, this is just the familiar Betabinomial distribution Purchasing Model • We now turn to the purchasing process • Assume that purchases made by individual i occur randomly and independently with mean rate λi, resulting in a Poisson process • Also assume that the means vary across the population according to a Gamma distribution • This means that the number of purchases will follow a negative binomial distribution Combined Model • Finally, assume that the purchase rates and brand choice probabilities are independent of one another • The resulting distribution of brand purchases follows an NBD-Dirichlet distribution (often called Dirichlet for short) – This has k+2 parameters, one for each brand and 2 for the NBD purchase distribution Discussion • This model has been found to describe many aspects of buyer behaviour well, across a wide range of situations • It assumes that buying behaviour is stationary, i.e. is not showing any trends over time • The Dirichlet distribution assumes that the probabilities are independent apart from the constraint that they sum to 1 – In marketing terms, this means that the market is not segmented • The proportion of purchases that go to one brand is independent of how the remaining purchases are spread across the other brands Discussion (continued) • One implication of this is an additivity property – Any two brands can be combined into a “super-brand” with expected purchases equal to the sum of the individual brand means – The rest of the model is unaffected by this change • The NBD-Dirichlet has also been applied to pack sizes, stores, TV programmes, etc Example of Model Fit A Single Brand • The NBD-Dirichlet model does not give exactly an NBD distribution for a single brand – However in practice the difference is minimal Duplication Between Brands • Generally constant down columns – Reflects unsegmented nature of most markets Duplication Law • The duplication between two brands is usually proportional to the product of their penetrations