Characterization of products

advertisement
BECHAUX Camille
DESPRES Yoann
HOLZMANN Céline
Sensory analysis: Characterization of
products by using uni and
multidimensional approach
I Description of data
II Unidimentionnal analysis
III Multidimentionnal analysis
Introduction :
Characterization of products has two main goals:
–
–
To characterize the products with numeric variables.
To classify the variables which best characterize the products.
I Description of data
We used a data frame which contains:
-12 orange juices
-12 panelists (Students)
-2 sessions
-10 attributes (sweet, pulpous, color intensity, sour, bitter, attack intensity, taste typicity, taste,
smell, smell intensity)
For each attribute each panel gave a score between 0 and 10.
The design is a complete and balance for the rank and the carry-over effects.
Acide
8
8
9
Sucree
4
5
6
6
7
II Unidimentionnal analysis
2
3
4
R code for visualisation of data:
2
x11()
boxprod(jus0506, col.p = 7, firstvar = 9, numr = 2, numc = 2)
1 2 3 4 5 6 7 8 9
11
1 2 3 4 5 6 7 8 9
11
In order to visualize the data for each attribute, you can print the boxplots. It’s also useful to
be sure that there is no mistake in the dataframe.
8
6
4
2
2
4
6
8
10
Pulpeux
Pulpous
10
Amere
Bitter
1 2 3 4 5 6 7 8 9
11
1 2 3 4 5 6 7 8 9
11
In this example, the observation of the boxplots shows that the attribute “pulpous” seems to
be very discriminating, whereas the attribute “bitter” is not. When you look at all the
boxplots, you can see that the attribute pulpous is the only one which seems to be very
discriminatory. But to make sure of witch attribute is significantly discriminatory, it is better
to realize a variance analysis.
R code for an analysis of variance:
model (AOV): " attribute = product + panelist + session + interactions "
res<-decat(jus0506,formul="~produit*numero_juge*seanceproduit:numero_juge:seance", firstvar = 9, proba=1, graph = TRUE)
barrow(t(res$tabT), numr = 3, numc = 3)
barrow(res$coeff, color = "orange")
Based on analysis of variance the “decat” method characterizes the products with all the
attributes.
intensit
G.Apprec
Tupicité
Pulpeux
-1
0
1
1
Those results are also summarized in the following frame.
2
2
3
3
intensit
Tupicité
G.Apprec
G.Apprec
Sucree
Intensit
O.Appréc
O.Intens
intensit
Pulpeux
3
G.Apprec
Tupicité
Pulpeux
Amere
Acide
Sucree
Intensit
O.Appréc
O.Intens
intensit
Tupicité
2
3
3
Pulpeux
G.Apprec
1
2
2
3
Amere
0
1
1
2
Amere
Tupicité
-1
-2
0
0
1
12
Acide
-1
-2
-1
-2
0
The “decat” function does an analysis of variance with the following model for each
attribute.
Model: attribute = product + panelist + session + interactions.
Acide
Pulpeux
Sucree
Amere
Intensit
Acide
O.Appréc
Sucree
O.Intens
Intensit
intensit
O.Appréc
O.Intens
intensit
G.Apprec
Tupicité
Pulpeux
G.Apprec
Amere
Tupicité
Acide
Pulpeux
Sucree
Amere
Intensit
Acide
O.Appréc
Sucree
O.Intens
Intensit
intensit
O.Appréc
O.Intens
-1
-2
6
Amere
Acide
-2
0
11
Sucree
Intensit
O.Appréc
O.Intens
intensit
G.Apprec
Tupicité
Pulpeux
-1
5
Amere
Acide
Sucree
Intensit
O.Appréc
O.Intens
-2
-2
-2
-1
-1
When the bars of the barplot are upper or lower than two standard deviations that means that
this product is significantly discriminated by this attribute.
In this example, the 5th product is not characterized by any attributes. On contrary the 12th is
7
8
characterized by pulpous (a low content of pulp) and by its low scores for the global
appreciation.
0
0
1
1
2
2
Ajusted mean
Intensité.Attaqueintensité_couleur
O.Intensité
Pulpeux
4.917
4.792
1.573
4.25
5
7.167
5.458
1.5
4.833
5.167
5.625
5.083
3.875
5.583
5.292
5.917
2
3.708
4.125
5.042
7
3.833
5.417
6
3.583
8
Amere
Acide
12
5.708
6.583
6.292
4
4.833
5.5
5
4.708
9
O.Appréciation T upicitéG.orange
Sucree
G.Appreciation
3.375
4.292
2.708
4.667
4.542
5.167
4.542
3.75
5.167
4.583
5.125
4.375
5.125
1.458
4.833
4.958
5.458
4.792
3.792
5
5.833
3.958
4
5.458
4.083
5.343
5.708
5.125
1.667
5.333
5
5.958
5.542
4.833
5.207
5.917
4.208
5.625
5.417
4.438
6.042
4.917
3.875
4.375
4.625
5.375
5.042
1.75
4.917
5.25
5.917
5.208
10
3.833
5.042
5.333
5.708
4.75
6.458
5
5.667
6.333
5.708
1
2.958
4.375
4.833
4.75
5.375
7.25
5.333
5.167
5.917
5.625
11
3.25
4.667
5.708
5.375
5.375
7.792
6
6
6.083
6.125
3
3.833
4.458
5.083
5.625
5.292
7.917
6.042
6.25
5.833
6.417
The decat method also gives this frame of adjusted mean by product.
The mean of each attribute by product is compared to the general mean, and the color shows
if the mean is significant (based on the P-value):


under the global mean (pink)
upper the global mean (blue)
In this example, we can characterize a product:
The product number 12 is significantly characterized by high marks for the attributes "Bitter",
"Sour", "attack intensity" and significantly characterized by low means for the attributes
"color intensity", "pulpous", "Odor intensity", "typicity of the orange taste", "sweet", "Global
appreciation". This juice can be characterized as a “strong” one, and is not really liked by
panelists.
Based on this frame, we can also group products by vague categories:
The orange juices 1, 11, 3 and 10 are globally characterized by a huge typicity of the orange
taste, a good global appreciation, a hight sweetie taste, are very pulpous and on the contrary a
low bitterness and a pale color : Those are globally “sweet” juices, and they are quite liked by
the panelists.
The orange juices 7, 2, 8, 6 and 9 are characterized by intermediate values for quite all the
attributes. They are intermediate juice, and they are liked by panelist but not as much as
“sweet” juices.
The orange juices 5, 12 and 4 are, on the contrary of the first group, characterized by a low
typicity of orange taste, a low global appreciation, a not very sweety taste, a low amount of
pulpous, and on the contrary a high attack intensity, a high bitterness, acidity and a deep
color. Those are stong juices, and they the ones with the lower global appreciation.
To conclude with this frame the panellists don’t like “hard juice”, and that they prefer sweeter
one.
The decat function can also create, just by adding some R code, some graphics of others
results that was created.
R code to add:
barrow(t(res$tabT), numr = 3, numc = 3)
5
Intensité.Attaque
Attack
intensity
8
9
10
11
12
8
9
10
11
12
3
4
5
6
7
1
2
10
5
0
8
9
10
11
12
3
4
5
6
7
1
2
12
9
10
11
7
8
4
5
6
2
3
1
12
10
11
7
8
9
5
6
2
3
4
-10 -5
0
5
10
Typicity
of orange
TupicitéG.orange
-10 -5
0
5
10
Pulpous
Pulpeux
-10 -5
3
4
5
6
7
10
5
0
12
9
10
11
7
8
4
5
6
2
3
1
-10 -5
0
5
10
Acide
Sour
-10 -5
12
10
11
7
8
9
5
6
2
3
4
1
-10 -5
0
5
10
Sucree
Sweety
Bitter
Amere
1
1
2
12
9
10
11
7
8
4
5
6
2
3
1
-10 -5
0
5
0
-10 -5
12
10
11
7
8
9
5
6
2
3
4
1
-10 -5
0
5
10
O.Appréciation
Appreciation
of odor
10
O.Intensité
Odor
intensity
10
intensité_couleur
Color intensity
It created a barplot of the T-statistics for a given product and a given attribute.
For exemple the attribute “Pulpous” is good to discriminate products (high bars) whereas
“Odor intensity” has little bars, the 12 products are not very different regarding this attribute.
When a bar is in the upper half, the product has a high value for this attribute.
Finally, the decat function is a good tool to characterize some products, as one product also as
a member of a group of products to compare.
R code for Panel Performance :
res2=panelperf(jus0506, firstvar = 9, formul = "~produit*numero_juge*seanceproduit:numero_juge:seance")
coltable(magicsort(res2$p.value, sort.mat = res2$p.value[,1], bycol = FALSE,
method = "median"), main.title = "Panel performance (sorted by product P-value)"
Panel performance (sorted by product P-value)
produit
numero_juge
seance
produit:numero_juge
produit:seance
numero_juge:seance
median
Pulpeux
1.243e-65
1.992e-09
0.4451
0.9196
9.899e-05
0.3089
0.1545
intensité_couleur
1.684e-11
3.794e-19
0.04006
0.00629
0.05204
0.3821
0.02317
Amere
1.35e-07
7.045e-39
0.8832
0.00305
0.9418
5.324e-05
0.001552
G.Appreciation
1.424e-07
9.979e-20
0.777
6.821e-09
0.1543
0.1861
0.07715
TupicitéG.orange
3.304e-06
1.361e-12
0.9891
0.0001227
0.8241
0.2769
0.1385
Sucree
1.114e-05
9.951e-26
0.9243
0.2386
0.7036
0.4657
0.3522
Acide
2.28e-05
4.934e-27
0.2362
0.001097
0.5349
0.001915
0.001506
O.Appréciation
0.0001361
1.848e-19
0.8802
0.01138
0.01854
0.1682
0.01496
Intensité.Attaque
0.0368
3.795e-07
0.116
0.02018
0.8009
0.01302
0.02849
O.Intensité
0.1148
1.607e-10
0.1161
0.8071
0.3215
0.1165
0.1163
The paneplperf function allows the creation of this frame, in which we can see the
significance of the different effects and interactions witch are tested in the model, like
“produit” (product), “séance” (session) or “juge” (panelist).
A significant effect (<0.05) will appear in pink.
All the attributes are significant to discriminate the products except the “O.intensité”
(intensity's smell).
On the contrary, the “seance” effect (session) has no impact on the attributes except for
intensite-couleur (color intensity). We can imagine that the luminosity was different from a
session to the other one.
Conclusion on the unidimentionnal analysis.
To conclude, we can say that with the unidimensionnal analysis it’s possible to see that there
are two big groups of products, the “sweet” and the “hard”. Each group is characterized by
opposed attributes.
The product number 12 seems to be very different from the others, because it has a lot of
attributes that are significant.
On the contrary, the product number 7, 6 and 8 are not very characterized by those attributes.
III Multidimentionnal analysis
R code for the panellipse method
res <- panellipse(jus0506, col.p = 7, col.j = 3, firstvar = 9)
coltable(res$hotelling, main.title = "P-values for the Hotelling's T2 tests")
The panellipse function makes a PCA on the table with the mean of the scores per product and
per attribute of the data and it gives the following factorial design.
Each color represents a product. The square is the mean point, and the other points represent
the products viewed by each panelist.
The product perception by each panelist was different. It seems that there is no real difference
between some products. So it could be useful to draw confidence ellipses.
5
Individual description
4
7
0
3
11
5
12
10
6
8
1
-5
2
-10
Dim 2 (17.25%)
9
-10
-5
0
Dim 1 (52.88%)
5
4
Confidence ellipses for the mean points
2
4
7
0
3
11
5
12
10
6
8
1
-2
Dim 2 (17.25%)
9
-6
-4
2
-8
-6
-4
-2
0
2
4
Dim 1 (52.88%)
Those ellipses were made by the panellipse function, in this way:
*The SensoMineR method panellips creates a virtual panel by random drawing from the
original panel.
*The gravity center of the panel perception of products is added on the design as a
supplementary element.
*Another panel is drawn, and this is done 500 times.
The 95% closest points of the gravity center of the representation of the original panel are
kept and this draws the ellipse.
Thanks to these ellipses we can see that the 12th product is well discriminated. The fist axis
splits the products number 12 from the others.
The products number 5, 6, 7, 8, 9 and 10 are closed to the origin of the design. This means
that those products are not well discriminated by those attributes.
The second axis opposes the products 2 and 4. We can’t say just with this design if they are
well discriminated from the others, because ellipses are intersected, but they are not very
overlap. To find it out, the panellipse method does the Hotelling’s T2 tests.
Those results confirm what we saw in the unidimensionnal analysis. It gives almost the
same information on which product is well discriminated that the frame “adjusted
mean”, but it is more clear to read.
P-values for the Hotelling's T2 tests
1
2
3
4
5
6
7
8
9
10
11
12
1
1
0.02643
0.1723
3.863e-05
0.00393
0.2219
0.009469
0.174
0.005823
0.2772
0.2077
4.983e-07
2
0.02643
1
4.083e-06
4.877e-07
0.0009731
0.004293
3.865e-05
0.000771
4.572e-05
0.0001007
1.692e-05
9.89e-07
3
0.1723
4.083e-06
1
1.658e-05
0.0001883
0.01116
0.03652
0.04595
0.00504
0.1209
0.9461
1.196e-08
4
3.863e-05
4.877e-07
1.658e-05
1
0.01497
0.0006617
0.01257
2.286e-06
0.1061
0.0003343
3.047e-05
8.563e-06
5
0.00393
0.0009731
0.0001883
0.01497
1
0.1086
0.3273
0.01672
0.5592
0.004288
0.002173
3.157e-06
6
0.2219
0.004293
0.01116
0.0006617
0.1086
1
0.09266
0.858
0.09241
0.3115
0.04552
9.663e-07
7
0.009469
3.865e-05
0.03652
0.01257
0.3273
0.09266
1
0.01095
0.664
0.1905
0.04411
5.146e-05
8
0.174
0.000771
0.04595
2.286e-06
0.01672
0.858
0.01095
1
0.0218
0.6835
0.1244
2.925e-08
9
0.005823
4.572e-05
0.00504
0.1061
0.5592
0.09241
0.664
0.0218
1
0.06608
0.01438
0.0001133
10
0.2772
0.0001007
0.1209
0.0003343
0.004288
0.3115
0.1905
0.6835
0.06608
1
0.3187
3.267e-08
11
0.2077
1.692e-05
0.9461
3.047e-05
0.002173
0.04552
0.04411
0.1244
0.01438
0.3187
1
1.347e-07
12
4.983e-07
9.89e-07
1.196e-08
8.563e-06
3.157e-06
9.663e-07
5.146e-05
2.925e-08
0.0001133
3.267e-08
1.347e-07
1
In each box we can find the p-value of the test between the two products. If the test is
significant the box is colored in pink.
So we can see that products 2 is well discriminated from all the other juices.
While the product number 4 is not well discriminated from 9 whereas it is well discriminated
from the others.
Those results are more clear than the ones we get in the unidimensional analysis (in
order to compare products), because all tests are in a synthetic frame.
In the same way we can represent the variability around the variables in the PCA made with
virtual panels.
0.5
1.0
Variables factor map (PCA)
intensité_couleur
O.Intensité
O.Appréciation
Intensité.Attaque
Sucree
Acide
Amere
Pulpeux
TupicitéG.orange
G.Appreciation
Acide
intensité_couleur
O.Appréciation
TupicitéG.orange
G.Appreciation
Amere
Intensité.Attaque
0.0
Dim 2 (17.25%)
O.Intensité
Sucree
-1.0
-0.5
Pulpeux
-1.0
-0.5
0.0
0.5
1.0
Dim 1 (52.88%)
A variable is well represented when the arrow is long enough to nearly reach the circle. When
all the points (witches are the ones that were used to draw the ellipses) are closed from each
other, it means that there is a consensus for the attribute (example: the pulpous).
On the contrary, when the points are scattered, that means that there is a huge variability in
the perception of the products by attributes.
The product number 12 is characterized by a high bitterness and attack intensity, and a low
pulpous, sweetie, global appreciation … Product number 2 and 4 are opposed on the color
intensity.
Those results are the same than the ones we get in the unidimensional analysis (in order
to characterize products by attributes). But the multidimensionnal analysis is more
global.
Conclusion:
To conclude, we have seen that the multidimensional and the unidimentional analyses
are quite complementary. In fact we can notice that thanks to the multidimensional analysis
it’s easier to see quickly which products are significantly different from the others and to see
with a global vision why. But in order to see more specifically which criterion makes this
product is special the unidimensional analysis is more adapted.
Download