INTERPRETATIONS OF PC’S USING SCREE PLOT The data we have chosen is the PIZZA dataset which has 9 variables: 1. Brands: A,B,C,D and E 2. ID of the Pizza 3. Moisture content in the Pizza 4. Protein weightage in Pizza 5. Fat content in Pizza 6. Ash 7. Sodium percentage in Pizza 8. Carbohydrate content in Pizza 9. Total calories in the Pizza We first standardized the variables and obtained the X matrix. Following picture is a depiction of data: To observe the correlation between the variables, we plotted the correlation plot: Following interpretations can be made using correlation plot 1. The variables (protein,ash), (fat,sodium), (fat,ash), (ash,sodium), (fat,calories) showed a positive association of more than 0.7 which is clear since these contents are likely to vary equally positively in the making of pizza. 2. The variables (moisture,calories), (protein, carbs) have a negative association of less than -0.7 which is also likely in the making of pizza. We now move to obtaining the principal components. The correlation plot of the first three principal components and variables is as follows: Except moisture and calories, all other variables are dominated in the first principal component. On the contrary, only moisture and calories are dominated in the second one. Interpretation will be clear once we see the scree plot: As the scree plot suggests, the first two components explain more than 94% of the variance. This was clear in the appearance of variables in the principal components. We can choose k=3, that is, we choose the first three principal components that explain more than 98.5% of the data. The coefficients of PC1, PC2 and PC3 in terms of variables appear this way: Plotting PC1 vs PC2 Here we can observe that some of the values that have lower value of PC1 have higher value of PC2. This is explained by the way variables are correlated. Viceversa holds.