Bradley-Terry Model Analysis of Cat Food Recipes

advertisement
Bradley-Terry Model Analysis of
Cat Food Recipes
Hongjie Deng1, Daniel R. Jeske2 and Ted Younglove3
Department of Statistics, University of California, Riverside, CA 92521
1Graduate Student, 2Faculty and Director of Collaboratory, 3Manager of Collaboratory
Cinnamon
Wheaties
Introduction
The Del Monte Pet Products Division of Del Monte Foods
conducted palatability studies of dry cat food, wet cat food, and
cat treats using paired comparison consumption tests.
Multiple Comparison Procedure
Model Goodness Of Fit
To identify which food recipes are different with respect to cat preference, the
procedure uses an algorithm to generate hypothetical tables of data under the
null hypothesis H 0 : v1  v2  v3  v4  v5 .
Test
Algorithm
The Statistical Consulting Collaboratory at the University of
California, Riverside was consulted to improve the analysis of
the paired comparison experiments, focusing initially on the
experiments that used dry cat food.
Test Result: p-value = 0.1093. Do not reject Ho at  = 0.05.
Calculate vˆi* ( i = 1,…,t ) for each table
How Well The Model Fit
Obtain
1i  j t
Compare the Monte-Carlo p-value with 
Food A
Data
29-C12
Food C
Food E
C34=
C35=
Bin(29,0.5) Bin(29,0.5)
Food E
Food C
Food D
C23=
C24=
C25=
Bin(30,0.5) Bin(30,0.5) Bin(30,0.5)
-
30-C13
30-C23
-
Food A
-
20
22
20
1
Food B
9
-
6
7
1
Food D
30-C14
30-C24
29-C34
-
C45=
Bin(30,0.5)
Food C
8
24
-
8
2
Food E
30-C15
30-C25
29-C35
30-C45
-
Food D
10
23
21
-
3
29
27
27
D1: 10 panels comparing all pairs of recipes with 30 cats each
D2: 4 panels comparing (A,B) , (B,C) , (C,D) , (D,E), each with 75 cats
D2 has the minimum number of comparisons needed to estimate all the ratings
and is motivated by being a simpler experiment to manage.
Power Comparison
Ho: v1 = v2 = v3 = v4 = v5 (i.e., no difference in recipes)
Ha: not Ho
Power levels for each design of a 5% test of Ho using 1000 simulated data sets
are presented below.
True Ratings
D1
D2
(1,1,1,1,1)
(1,1,1,1,1.2)
(1,1,1,1.2,1.2)
(1,1,1.2,1.2,1.2)
(1,1,1,1,1.5)
(1,1,1,1.5,1.5)
(1,1,1.5,1.5,1.5)
(1,1,1,1,1.8)
(1,1,1,1.8,1.8)
(1,1,1.8,1.8,1.8)
(1,1.2,1.2,1.2,1.2)
(1,1.5,1.5,1.5,1.5)
0.055
0.111
0.180
0.178
0.544
0.743
0.776
0.946
0.992
0.993
0.122
0.554
0.050
0.084
0.095
0.078
0.378
0.338
0.383
0.769
0.754
0.797
0.074
0.365
Histogram Of Q
-
2500
Note: Cellij=number of cats who prefer food i over food j
Frequency
2000
Bradley-Terry Model
Suppose there are t treatments in an experiment involving paired
comparisons. Each pair of treatments is compared by k different
judges.
1500
29
1 (0.70)
2 (1.33)
3 (2.24)
-
Power
1000
Food E
6 (9.91)
7 (6.82)
24 (20.09)
8 (10.83)
23 (23.18) 21 (18.71)
29 (29.30) 27 (27.67) 27 (27.76)
Two Alternative Designs
C12=
C13=
C14=
C15=
Bin(29,0.5) Bin(30,0.5) Bin(30,0.5) Bin(30,0.5)
-
Food B
Food B
Food E
1 (2.73)
Power Analysis
How To Randomly Generate A Table Of Data
Food A
Food B 9 (5.57)
Food C 8 (9.75)
Food D 10 (13.41)
Food E 29 (27.17)
Food B
Food C
Food D
20 (23.43) 22 (20.25) 20 (16.59)
Observed Frequencies and ( Expected Frequencies )
Conclusion
The relative amount of each food, A and B, that the cats consumed
over the two days was used to indicate which food they preferred.
In this poster, we show how to analyze the data and select the food
recipe that is most attractive to the cats.
Food D
Food A
-
for each table
For each (i, j) pair, calculate Monte-Carlo p-value = ( # of Q > | vˆi  vˆ j | ) /104
The experiment was conducted using a colony of 300 cats, male
and female, of various breeds and ages. Each cat in a panels of 30
randomly selected cats was given two different bowls of food on
each of two days. On the first day food A was placed to the left and
food B was placed to the right. On the second day, the left-right
orientation was reversed.
Food C
Q  Max | vˆi*  vˆ*j |
Food A
Experimental Design
Food B
Ho: Bradley-Terry model fits the data
Ha: Bradley-Terry model does not fit the data
Test statistic:  = -2( log LHo  log L(12 ,..., t 1,t ) ) where LHo is the saturated
likelihood function, reduced by the Bradley-Terry link function.
Randomly generate a table of data 104 times (see below)
Our goal was to apply Bradley-Terry modeling and analysis
techniques to the experimental data. In particular, we wanted to
estimate a quality score for each food recipe, test whether the
scores were significantly different, and explore the power of
alternative paired comparison designs.
Food A
Lana
Wheaties
500
Define Pr ( treatment i is preferred over treatment j ) =  ij .
Define rijk = rank of the i-th treatment when compared
with j-th treatment by judge k.
0
Pr (Q>0.6512)=0.05
n
t
2r
2r
The saturated likelihood function is L(12 ,...,  t 1,t )    ( ij ) (1   ij )
ijk
jik
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Comparison Of Standard Deviation Of Contrasts
k 1 i  j
The Bradley-Terry link function is
of treatment i.
t
Rank treatments based on
pi 
 ij 

j 1, j  i
Q
evi
e e
vi
vj
, where vi = true rating
Test Results
Contrasts
ij
t 1
Food E Food A
.
Estimation Of True Ratings vi
Food D Food C
D1
Food B
Note: Foods connected by a line are not significantly different at  =0.05
Estimates Of  ij
Maximum Likelihood Estimates of the true ratings vi were obtained
using the R software package and are presented below.
Diet
Food E
Food A
Food D
Food C
Food B
vˆi
2.3024
0
-0.2132
-0.7307
-1.4367
pˆ i
0.9413
0.5317
0.4802
0.3535
0.1933
Rank
1
2
3
4
5
Food A
Standard
Deviation
Food A
Food B
Food C
Food D
Food E
-
0.81
0.67
0.55
0.09
Food B
0.19
-
0.33
0.23
0.02
Food C
0.33
0.67
-
0.37
0.05
Food D
0.45
0.77
0.63
-
0.07
Food E
0.91
0.98
0.95
0.93
-
D2
Contrasts
Standard
Deviation
D1
D2
v1-v2
v2-v4
0.23
0.24
0.23
0.34
v1-v3
v2-v5
0.23
0.33
0.24
0.41
v1-v4
v3-v4
0.24
0.42
0.24
0.23
v1-v5
v3-v5
0.24
0.48
0.24
0.33
v2-v3
v4-v5
0.23
0.24
0.24
0.24
Based on 1000 simulations under Ho. Results are relatively
invariant to what is assumed for the true vi values .
Conclusion
Although D2 is simpler to manage, it has less power than D1. The loss of
information by using less panels is not compensated for by using more cats
in each panel. Contrasts under D1 have equal precision while for D2 they
do not.
Special Thanks To: Javier Suarez and Hua Yu of the UCR Statistical Consulting
Collaboratory. Graduate Students of the Fall 2005 offering of STAT 293: Yingtao Bi,
Mike Huang, Steward Huang, Sungsu Kim, Scott Lesch, Rupam Pal, Jose Sanchez,
Jason Wilson, Rui Xiao, Karen Huaying Xu, and Qi Zhang.
Download