Uploaded by mayuri wabale

INTERPRETATIONS OF PC’S USING SCREE PLOT

advertisement
INTERPRETATIONS OF PC’S USING SCREE PLOT
The data we have chosen is the PIZZA dataset which has 9 variables:
1. Brands: A,B,C,D and E
2. ID of the Pizza
3. Moisture content in the Pizza
4. Protein weightage in Pizza
5. Fat content in Pizza
6. Ash
7. Sodium percentage in Pizza
8. Carbohydrate content in Pizza
9. Total calories in the Pizza
We first standardized the variables and obtained the X matrix. Following picture
is a depiction of data:
To observe the correlation between the variables, we plotted the correlation plot:
Following interpretations can be made using correlation plot
1. The variables (protein,ash), (fat,sodium), (fat,ash), (ash,sodium),
(fat,calories) showed a positive association of more than 0.7 which is
clear since these contents are likely to vary equally positively in the making
of pizza.
2. The variables (moisture,calories), (protein, carbs) have a negative
association of less than -0.7 which is also likely in the making of pizza.
We now move to obtaining the principal components.
The correlation plot of the first three principal components and variables is as
follows:
Except moisture and calories, all other variables are dominated in the first
principal component.
On the contrary, only moisture and calories are dominated in the second one.
Interpretation will be clear once we see the scree plot:
As the scree plot suggests, the first two components explain more than 94% of the
variance. This was clear in the appearance of variables in the principal
components.
We can choose k=3, that is, we choose the first three principal components that
explain more than 98.5% of the data.
The coefficients of PC1, PC2 and PC3 in terms of variables appear this
way:
Plotting PC1 vs PC2
Here we can observe that some of the values that have lower value of PC1 have
higher value of PC2. This is explained by the way variables are correlated.
Viceversa holds.
Download