BECHAUX Camille DESPRES Yoann HOLZMANN Céline Sensory analysis: Characterization of products by using uni and multidimensional approach I Description of data II Unidimentionnal analysis III Multidimentionnal analysis Introduction : Characterization of products has two main goals: – – To characterize the products with numeric variables. To classify the variables which best characterize the products. I Description of data We used a data frame which contains: -12 orange juices -12 panelists (Students) -2 sessions -10 attributes (sweet, pulpous, color intensity, sour, bitter, attack intensity, taste typicity, taste, smell, smell intensity) For each attribute each panel gave a score between 0 and 10. The design is a complete and balance for the rank and the carry-over effects. Acide 8 8 9 Sucree 4 5 6 6 7 II Unidimentionnal analysis 2 3 4 R code for visualisation of data: 2 x11() boxprod(jus0506, col.p = 7, firstvar = 9, numr = 2, numc = 2) 1 2 3 4 5 6 7 8 9 11 1 2 3 4 5 6 7 8 9 11 In order to visualize the data for each attribute, you can print the boxplots. It’s also useful to be sure that there is no mistake in the dataframe. 8 6 4 2 2 4 6 8 10 Pulpeux Pulpous 10 Amere Bitter 1 2 3 4 5 6 7 8 9 11 1 2 3 4 5 6 7 8 9 11 In this example, the observation of the boxplots shows that the attribute “pulpous” seems to be very discriminating, whereas the attribute “bitter” is not. When you look at all the boxplots, you can see that the attribute pulpous is the only one which seems to be very discriminatory. But to make sure of witch attribute is significantly discriminatory, it is better to realize a variance analysis. R code for an analysis of variance: model (AOV): " attribute = product + panelist + session + interactions " res<-decat(jus0506,formul="~produit*numero_juge*seanceproduit:numero_juge:seance", firstvar = 9, proba=1, graph = TRUE) barrow(t(res$tabT), numr = 3, numc = 3) barrow(res$coeff, color = "orange") Based on analysis of variance the “decat” method characterizes the products with all the attributes. intensit G.Apprec Tupicité Pulpeux -1 0 1 1 Those results are also summarized in the following frame. 2 2 3 3 intensit Tupicité G.Apprec G.Apprec Sucree Intensit O.Appréc O.Intens intensit Pulpeux 3 G.Apprec Tupicité Pulpeux Amere Acide Sucree Intensit O.Appréc O.Intens intensit Tupicité 2 3 3 Pulpeux G.Apprec 1 2 2 3 Amere 0 1 1 2 Amere Tupicité -1 -2 0 0 1 12 Acide -1 -2 -1 -2 0 The “decat” function does an analysis of variance with the following model for each attribute. Model: attribute = product + panelist + session + interactions. Acide Pulpeux Sucree Amere Intensit Acide O.Appréc Sucree O.Intens Intensit intensit O.Appréc O.Intens intensit G.Apprec Tupicité Pulpeux G.Apprec Amere Tupicité Acide Pulpeux Sucree Amere Intensit Acide O.Appréc Sucree O.Intens Intensit intensit O.Appréc O.Intens -1 -2 6 Amere Acide -2 0 11 Sucree Intensit O.Appréc O.Intens intensit G.Apprec Tupicité Pulpeux -1 5 Amere Acide Sucree Intensit O.Appréc O.Intens -2 -2 -2 -1 -1 When the bars of the barplot are upper or lower than two standard deviations that means that this product is significantly discriminated by this attribute. In this example, the 5th product is not characterized by any attributes. On contrary the 12th is 7 8 characterized by pulpous (a low content of pulp) and by its low scores for the global appreciation. 0 0 1 1 2 2 Ajusted mean Intensité.Attaqueintensité_couleur O.Intensité Pulpeux 4.917 4.792 1.573 4.25 5 7.167 5.458 1.5 4.833 5.167 5.625 5.083 3.875 5.583 5.292 5.917 2 3.708 4.125 5.042 7 3.833 5.417 6 3.583 8 Amere Acide 12 5.708 6.583 6.292 4 4.833 5.5 5 4.708 9 O.Appréciation T upicitéG.orange Sucree G.Appreciation 3.375 4.292 2.708 4.667 4.542 5.167 4.542 3.75 5.167 4.583 5.125 4.375 5.125 1.458 4.833 4.958 5.458 4.792 3.792 5 5.833 3.958 4 5.458 4.083 5.343 5.708 5.125 1.667 5.333 5 5.958 5.542 4.833 5.207 5.917 4.208 5.625 5.417 4.438 6.042 4.917 3.875 4.375 4.625 5.375 5.042 1.75 4.917 5.25 5.917 5.208 10 3.833 5.042 5.333 5.708 4.75 6.458 5 5.667 6.333 5.708 1 2.958 4.375 4.833 4.75 5.375 7.25 5.333 5.167 5.917 5.625 11 3.25 4.667 5.708 5.375 5.375 7.792 6 6 6.083 6.125 3 3.833 4.458 5.083 5.625 5.292 7.917 6.042 6.25 5.833 6.417 The decat method also gives this frame of adjusted mean by product. The mean of each attribute by product is compared to the general mean, and the color shows if the mean is significant (based on the P-value): under the global mean (pink) upper the global mean (blue) In this example, we can characterize a product: The product number 12 is significantly characterized by high marks for the attributes "Bitter", "Sour", "attack intensity" and significantly characterized by low means for the attributes "color intensity", "pulpous", "Odor intensity", "typicity of the orange taste", "sweet", "Global appreciation". This juice can be characterized as a “strong” one, and is not really liked by panelists. Based on this frame, we can also group products by vague categories: The orange juices 1, 11, 3 and 10 are globally characterized by a huge typicity of the orange taste, a good global appreciation, a hight sweetie taste, are very pulpous and on the contrary a low bitterness and a pale color : Those are globally “sweet” juices, and they are quite liked by the panelists. The orange juices 7, 2, 8, 6 and 9 are characterized by intermediate values for quite all the attributes. They are intermediate juice, and they are liked by panelist but not as much as “sweet” juices. The orange juices 5, 12 and 4 are, on the contrary of the first group, characterized by a low typicity of orange taste, a low global appreciation, a not very sweety taste, a low amount of pulpous, and on the contrary a high attack intensity, a high bitterness, acidity and a deep color. Those are stong juices, and they the ones with the lower global appreciation. To conclude with this frame the panellists don’t like “hard juice”, and that they prefer sweeter one. The decat function can also create, just by adding some R code, some graphics of others results that was created. R code to add: barrow(t(res$tabT), numr = 3, numc = 3) 5 Intensité.Attaque Attack intensity 8 9 10 11 12 8 9 10 11 12 3 4 5 6 7 1 2 10 5 0 8 9 10 11 12 3 4 5 6 7 1 2 12 9 10 11 7 8 4 5 6 2 3 1 12 10 11 7 8 9 5 6 2 3 4 -10 -5 0 5 10 Typicity of orange TupicitéG.orange -10 -5 0 5 10 Pulpous Pulpeux -10 -5 3 4 5 6 7 10 5 0 12 9 10 11 7 8 4 5 6 2 3 1 -10 -5 0 5 10 Acide Sour -10 -5 12 10 11 7 8 9 5 6 2 3 4 1 -10 -5 0 5 10 Sucree Sweety Bitter Amere 1 1 2 12 9 10 11 7 8 4 5 6 2 3 1 -10 -5 0 5 0 -10 -5 12 10 11 7 8 9 5 6 2 3 4 1 -10 -5 0 5 10 O.Appréciation Appreciation of odor 10 O.Intensité Odor intensity 10 intensité_couleur Color intensity It created a barplot of the T-statistics for a given product and a given attribute. For exemple the attribute “Pulpous” is good to discriminate products (high bars) whereas “Odor intensity” has little bars, the 12 products are not very different regarding this attribute. When a bar is in the upper half, the product has a high value for this attribute. Finally, the decat function is a good tool to characterize some products, as one product also as a member of a group of products to compare. R code for Panel Performance : res2=panelperf(jus0506, firstvar = 9, formul = "~produit*numero_juge*seanceproduit:numero_juge:seance") coltable(magicsort(res2$p.value, sort.mat = res2$p.value[,1], bycol = FALSE, method = "median"), main.title = "Panel performance (sorted by product P-value)" Panel performance (sorted by product P-value) produit numero_juge seance produit:numero_juge produit:seance numero_juge:seance median Pulpeux 1.243e-65 1.992e-09 0.4451 0.9196 9.899e-05 0.3089 0.1545 intensité_couleur 1.684e-11 3.794e-19 0.04006 0.00629 0.05204 0.3821 0.02317 Amere 1.35e-07 7.045e-39 0.8832 0.00305 0.9418 5.324e-05 0.001552 G.Appreciation 1.424e-07 9.979e-20 0.777 6.821e-09 0.1543 0.1861 0.07715 TupicitéG.orange 3.304e-06 1.361e-12 0.9891 0.0001227 0.8241 0.2769 0.1385 Sucree 1.114e-05 9.951e-26 0.9243 0.2386 0.7036 0.4657 0.3522 Acide 2.28e-05 4.934e-27 0.2362 0.001097 0.5349 0.001915 0.001506 O.Appréciation 0.0001361 1.848e-19 0.8802 0.01138 0.01854 0.1682 0.01496 Intensité.Attaque 0.0368 3.795e-07 0.116 0.02018 0.8009 0.01302 0.02849 O.Intensité 0.1148 1.607e-10 0.1161 0.8071 0.3215 0.1165 0.1163 The paneplperf function allows the creation of this frame, in which we can see the significance of the different effects and interactions witch are tested in the model, like “produit” (product), “séance” (session) or “juge” (panelist). A significant effect (<0.05) will appear in pink. All the attributes are significant to discriminate the products except the “O.intensité” (intensity's smell). On the contrary, the “seance” effect (session) has no impact on the attributes except for intensite-couleur (color intensity). We can imagine that the luminosity was different from a session to the other one. Conclusion on the unidimentionnal analysis. To conclude, we can say that with the unidimensionnal analysis it’s possible to see that there are two big groups of products, the “sweet” and the “hard”. Each group is characterized by opposed attributes. The product number 12 seems to be very different from the others, because it has a lot of attributes that are significant. On the contrary, the product number 7, 6 and 8 are not very characterized by those attributes. III Multidimentionnal analysis R code for the panellipse method res <- panellipse(jus0506, col.p = 7, col.j = 3, firstvar = 9) coltable(res$hotelling, main.title = "P-values for the Hotelling's T2 tests") The panellipse function makes a PCA on the table with the mean of the scores per product and per attribute of the data and it gives the following factorial design. Each color represents a product. The square is the mean point, and the other points represent the products viewed by each panelist. The product perception by each panelist was different. It seems that there is no real difference between some products. So it could be useful to draw confidence ellipses. 5 Individual description 4 7 0 3 11 5 12 10 6 8 1 -5 2 -10 Dim 2 (17.25%) 9 -10 -5 0 Dim 1 (52.88%) 5 4 Confidence ellipses for the mean points 2 4 7 0 3 11 5 12 10 6 8 1 -2 Dim 2 (17.25%) 9 -6 -4 2 -8 -6 -4 -2 0 2 4 Dim 1 (52.88%) Those ellipses were made by the panellipse function, in this way: *The SensoMineR method panellips creates a virtual panel by random drawing from the original panel. *The gravity center of the panel perception of products is added on the design as a supplementary element. *Another panel is drawn, and this is done 500 times. The 95% closest points of the gravity center of the representation of the original panel are kept and this draws the ellipse. Thanks to these ellipses we can see that the 12th product is well discriminated. The fist axis splits the products number 12 from the others. The products number 5, 6, 7, 8, 9 and 10 are closed to the origin of the design. This means that those products are not well discriminated by those attributes. The second axis opposes the products 2 and 4. We can’t say just with this design if they are well discriminated from the others, because ellipses are intersected, but they are not very overlap. To find it out, the panellipse method does the Hotelling’s T2 tests. Those results confirm what we saw in the unidimensionnal analysis. It gives almost the same information on which product is well discriminated that the frame “adjusted mean”, but it is more clear to read. P-values for the Hotelling's T2 tests 1 2 3 4 5 6 7 8 9 10 11 12 1 1 0.02643 0.1723 3.863e-05 0.00393 0.2219 0.009469 0.174 0.005823 0.2772 0.2077 4.983e-07 2 0.02643 1 4.083e-06 4.877e-07 0.0009731 0.004293 3.865e-05 0.000771 4.572e-05 0.0001007 1.692e-05 9.89e-07 3 0.1723 4.083e-06 1 1.658e-05 0.0001883 0.01116 0.03652 0.04595 0.00504 0.1209 0.9461 1.196e-08 4 3.863e-05 4.877e-07 1.658e-05 1 0.01497 0.0006617 0.01257 2.286e-06 0.1061 0.0003343 3.047e-05 8.563e-06 5 0.00393 0.0009731 0.0001883 0.01497 1 0.1086 0.3273 0.01672 0.5592 0.004288 0.002173 3.157e-06 6 0.2219 0.004293 0.01116 0.0006617 0.1086 1 0.09266 0.858 0.09241 0.3115 0.04552 9.663e-07 7 0.009469 3.865e-05 0.03652 0.01257 0.3273 0.09266 1 0.01095 0.664 0.1905 0.04411 5.146e-05 8 0.174 0.000771 0.04595 2.286e-06 0.01672 0.858 0.01095 1 0.0218 0.6835 0.1244 2.925e-08 9 0.005823 4.572e-05 0.00504 0.1061 0.5592 0.09241 0.664 0.0218 1 0.06608 0.01438 0.0001133 10 0.2772 0.0001007 0.1209 0.0003343 0.004288 0.3115 0.1905 0.6835 0.06608 1 0.3187 3.267e-08 11 0.2077 1.692e-05 0.9461 3.047e-05 0.002173 0.04552 0.04411 0.1244 0.01438 0.3187 1 1.347e-07 12 4.983e-07 9.89e-07 1.196e-08 8.563e-06 3.157e-06 9.663e-07 5.146e-05 2.925e-08 0.0001133 3.267e-08 1.347e-07 1 In each box we can find the p-value of the test between the two products. If the test is significant the box is colored in pink. So we can see that products 2 is well discriminated from all the other juices. While the product number 4 is not well discriminated from 9 whereas it is well discriminated from the others. Those results are more clear than the ones we get in the unidimensional analysis (in order to compare products), because all tests are in a synthetic frame. In the same way we can represent the variability around the variables in the PCA made with virtual panels. 0.5 1.0 Variables factor map (PCA) intensité_couleur O.Intensité O.Appréciation Intensité.Attaque Sucree Acide Amere Pulpeux TupicitéG.orange G.Appreciation Acide intensité_couleur O.Appréciation TupicitéG.orange G.Appreciation Amere Intensité.Attaque 0.0 Dim 2 (17.25%) O.Intensité Sucree -1.0 -0.5 Pulpeux -1.0 -0.5 0.0 0.5 1.0 Dim 1 (52.88%) A variable is well represented when the arrow is long enough to nearly reach the circle. When all the points (witches are the ones that were used to draw the ellipses) are closed from each other, it means that there is a consensus for the attribute (example: the pulpous). On the contrary, when the points are scattered, that means that there is a huge variability in the perception of the products by attributes. The product number 12 is characterized by a high bitterness and attack intensity, and a low pulpous, sweetie, global appreciation … Product number 2 and 4 are opposed on the color intensity. Those results are the same than the ones we get in the unidimensional analysis (in order to characterize products by attributes). But the multidimensionnal analysis is more global. Conclusion: To conclude, we have seen that the multidimensional and the unidimentional analyses are quite complementary. In fact we can notice that thanks to the multidimensional analysis it’s easier to see quickly which products are significantly different from the others and to see with a global vision why. But in order to see more specifically which criterion makes this product is special the unidimensional analysis is more adapted.