Regression Assignment Catherine Dumas Dima Kassab Anthony Leonardi EPSY 887 10 March, 2011 Regressions were run on cereal data obtained in R. The raw data is shown in Appendix A. Column 1, the manufacturers, and column 11, vitamins, were removed before analysis was performed. This data set was called “cereal9.” The summary of this data set is included in Appendix A for completeness, but will not be discussed further. Appendix B shows the scatter plot, including the regression line, for the first simple linear regression performed. The predictor (independent) variable was fat, and the criterion (dependent) variable was calories. From this graph, the regression summary was obtained, which appears below the graph in Appendix B. From this information, the regression equation is: Y (calories) = 117.60 + 22.36Fat + 3.854 This equation tells us, first, that if the fat content is zero, the caloric content will be 117.60. Adding Fat to the equation accounts for 22.36% of the variability. Thus, more factors should be added to the equation (multiple regression). The error term, 3.854, shows the spread of the data, and appears reasonable. Adding more predictors may decrease this value as more of the variability is accounted for. The scatter plot in Appendix B shows a moderate spread around the regression line (best fit). It also shows one outlier. The t(63) statistic (5.803) is significant at p < 0.001 as is the F(1,63) statistic (33.67). These both tell us that Fat is a significant predictor of calories. The 95% Confidence Interval (14.66031, 30.06174) does not pass through zero, and thus also shows the result to be significant. It shows that, with 95% confidence, mean values should fall between these two values. Also, according to Cumming and Maillardet (2006), replications of this experiment will capture 83.4% of future means. R2 for this regression is 0.3483, with an adjusted R2 of 0.338. This value indicates that about 35% of the variation in calories can be explained by fat. If we want to compare this regression with other simple regressions with this criterion variable, we need to look at the adjusted R2 (0.338). The adjustment lowers the value of R2 as more predictors are added. In this case, we see that about 34% of the variability in calories is explained by fat content. One other value is important here: The correlation of fat with calories is 0.59, which is fairly high. The second regression, using carbohydrates as the predictor (independent) variable and keeping calories as the criterion (dependent) variable. Appendix C shows the scatter plot, including the regression line, for the second simple linear regression. As before, the regression summary was obtained from the graph, and it appears below the graph in Appendix C. From this information, the regression equation is: Y (calories) = 33.3401 + 5.8128(Carbohydrates) + 0.5708 This equation tells us, first, that if the carbohydrate content is zero, the caloric content will be 33.3401. Adding Carbohydrate to the equation accounts for 5.81% of the variability. Thus, more factors should be added to the equation (multiple regression). The error term, 0.5708, shows the spread of the data, and appears fairly small. Adding more predictors may decrease this value as more of the variability is accounted for. The scatter plot in Appendix B shows a small to moderate spread around the regression line (best fit). It also shows two possible outliers. The t(63) statistic (10.183) is significant at p < 0.001 as is the F(1,63) statistic (103.7). These both tell us that Carbohydrate is a significant predictor of calories. The 95% Confidence Interval (4.67, 6.95) does not pass through zero, and thus also shows the result to be significant. It shows that, with 95% confidence, values should fall between these two values. Also, according to Cumming and Maillardet (2006), replications of this experiment will capture 83.4% of future means as previously stated. R2 for this second regression is 0.6221, with an adjusted R2 of 0.6161. This value indicates that about 62% of the variation in calories can be explained by carbohydrates. The adjusted R2 for this regression is 0.6161, or about 62%. If we compare the two R2 values, we see that more of the variability in calories is explained by carbohydrates than by fat. The correlation matrix (Appendix D) shows the correlations of all 9 variables. Most of the correlations are small, but the correlation of protein with calories is large (0.7) as are carbohydrates with calories (0.78), potassium with protein (0.84), and fibre with protein (0.80). Some are more moderate: fat with calories (0.59), sodium with calories (0.52), sodium with protein (0.57), and others. The rest are fairly small, 0.2-0.4. The covariance matrix (Appendix E) illustrates how the values covary with each other. Some are quite large (13,109 for sodium and potassium, for example), while some are less than 1 (0.45 for shelf and fat). Appendix F shows the data summary for the data set used (cereal9). Appendix G shows the data density plot. What we found interesting about this plot was the clumping of the data for most of the variables. It appears that sugars vary widely in cereals! The next three Appendices show data density plots. In Appendix H, the Gpairs plot is shown. This shows data density by pairs. Thus, calories vs. protein is shown in the first square next to “calories” in the top row. Note the grouping of most of the data, except in some of the “sugars” panes. Appendix I shows another plot of the pairs, but this time the points are larger and easier to see at a quick glance. Appendix J illustrates a data overlay plot. The density data is overlaid with the loess data, described below. This shows variability in the data points with the loess data for each pair. The histograms (Appendix K) are another method used to visualize the data. We were able to generate all 9 graphs in one Figure, which allows us to see all distributions at once. You can see that most of the data is clumped toward the y-axis (positively skewed). However, as before, the sugar plot (#7) immediately jumps out as having a different distribution (bimodal?). Appendix L shows the Loess data, with the demo plot illustrated in Appendix M. The predictive capability of this plot is evident. Appendix N shows the Loess prediction data, with two 3d plots in the next two Appendices. Appendix O shows the plot calculated as a product (using *), where Appendix P is additive (using +). We don’t see much difference between the two plots. Appendix Q lists the R code used in this paper. Conclusions Regression is a useful technique for prediction. It allows estimation of the variability in one or more criterion variables when using one or multiple predictor variables. Also, the adjusted R2 gives an estimation of how much variability is accounted for by the predictors used, so more predictors can be added and another regression run in an attempt to increase this value and lower the error term in the regression equation. Additionally, using R allows many different types of graphical analyses which other programs may not offer. Reference Cumming, G., & Maillardet, R. (2006). Confidence intervals and replication: Where will the next mean fall? Psychological Methods, 11(3), 217-227. Appendix A: Raw Data calories protein fat sodium 100% Bran 212.12121 12.1212121 3.0303030 393.93939 All-Bran 212.12121 12.1212121 3.0303030 787.87879 All-Bran with Extra Fiber 100.00000 8.0000000 0.0000000 280.00000 Apple Cinnamon Cheerios 146.66667 2.6666667 2.6666667 240.00000 Apple Jacks 110.00000 2.0000000 0.0000000 125.00000 Basic 4 173.33333 4.0000000 2.6666667 280.00000 Bran Chex 134.32836 2.9850746 1.4925373 298.50746 Bran Flakes 134.32836 4.4776119 0.0000000 313.43284 Cap'n'Crunch 160.00000 1.3333333 2.6666667 293.33333 Cheerios 88.00000 4.8000000 1.6000000 232.00000 Cinnamon Toast Crunch 160.00000 1.3333333 4.0000000 280.00000 Clusters 220.00000 6.0000000 4.0000000 280.00000 Cocoa Puffs 110.00000 1.0000000 1.0000000 180.00000 Corn Chex 110.00000 2.0000000 0.0000000 280.00000 Corn Flakes 100.00000 2.0000000 0.0000000 290.00000 Corn Pops 110.00000 1.0000000 0.0000000 90.00000 Count Chocula 110.00000 1.0000000 1.0000000 180.00000 Cracklin' Oat Bran 220.00000 6.0000000 6.0000000 280.00000 Crispix 110.00000 2.0000000 0.0000000 220.00000 Crispy Wheat & Raisins 133.33333 2.6666667 1.3333333 186.66667 Double Chex 133.33333 2.6666667 0.0000000 253.33333 Froot Loops 110.00000 2.0000000 1.0000000 125.00000 Frosted Flakes 146.66667 1.3333333 0.0000000 266.66667 Frosted Mini-Wheats 125.00000 3.7500000 0.0000000 0.00000 Fruit & Fibre: Dates Walnuts and Oats 179.10448 4.4776119 2.9850746 238.80597 Fruitful Bran 179.10448 4.4776119 0.0000000 358.20896 Fruity Pebbles 146.66667 1.3333333 1.3333333 180.00000 Golden Crisp 113.63636 2.2727273 0.0000000 51.13636 Golden Grahams 146.66667 1.3333333 1.3333333 373.33333 Grape Nuts Flakes 113.63636 3.4090909 1.1363636 159.09091 Grape-Nuts 440.00000 12.0000000 0.0000000 680.00000 Great Grains Pecan 363.63636 9.0909091 9.0909091 227.27273 Honey Graham Ohs 120.00000 1.0000000 2.0000000 220.00000 Honey Nut Cheerios 146.66667 4.0000000 1.3333333 333.33333 Honey-comb 82.70677 0.7518797 0.0000000 135.33835 Just Right Fruit & Nut 186.66667 4.0000000 1.3333333 226.66667 Kix 73.33333 1.3333333 0.6666667 173.33333 Life 149.25373 5.9701493 2.9850746 223.88060 Lucky Charms 110.00000 2.0000000 1.0000000 180.00000 Mueslix Crispy Blend 238.80597 4.4776119 2.9850746 223.88060 Multi-Grain Cheerios 100.00000 2.0000000 1.0000000 220.00000 Nut&Honey Crunch 179.10448 2.9850746 1.4925373 283.58209 Nutri-Grain Almond-Raisin 208.95522 4.4776119 2.9850746 328.35821 Oatmeal Raisin Crisp 260.00000 6.0000000 4.0000000 340.00000 Post Nat. Raisin Bran 179.10448 4.4776119 1.4925373 298.50746 Product 19 100.00000 3.0000000 0.0000000 320.00000 Puffed Rice 50.00000 1.0000000 0.0000000 0.00000 Quaker Oat Squares 200.00000 8.0000000 2.0000000 270.00000 Raisin Bran 160.00000 4.0000000 1.3333333 280.00000 Raisin Nut Bran 200.00000 6.0000000 4.0000000 280.00000 Raisin Squares 180.00000 4.0000000 0.0000000 0.00000 Rice Chex 97.34513 0.8849558 0.0000000 212.38938 Rice Krispies 110.00000 2.0000000 0.0000000 290.00000 Shredded Wheat 'n'Bran 134.32836 4.4776119 0.0000000 0.00000 Shredded Wheat spoon size 134.32836 4.4776119 0.0000000 0.00000 Smacks 146.66667 2.6666667 1.3333333 93.33333 Special K 110.00000 6.0000000 0.0000000 230.00000 Total Corn Flakes 110.00000 2.0000000 1.0000000 200.00000 Total Raisin Bran 140.00000 3.0000000 1.0000000 190.00000 Total Whole Grain 100.00000 3.0000000 1.0000000 200.00000 Triples 146.66667 2.6666667 1.3333333 333.33333 Trix 110.00000 1.0000000 1.0000000 140.00000 Wheat Chex 149.25373 4.4776119 1.4925373 343.28358 Wheaties 100.00000 3.0000000 1.0000000 200.00000 Wheaties Honey Gold 146.66667 2.6666667 1.3333333 266.66667 fibre carbo sugars shelf 100% Bran 30.303030 15.15152 18.181818 3 All-Bran 27.272727 21.21212 15.151515 3 All-Bran with Extra Fiber 28.000000 16.00000 0.000000 3 Apple Cinnamon Cheerios 2.000000 14.00000 13.333333 1 Apple Jacks 1.000000 11.00000 14.000000 2 Basic 4 2.666667 24.00000 10.666667 3 Bran Chex 5.970149 22.38806 8.955224 1 Bran Flakes 7.462687 19.40299 7.462687 3 Cap'n'Crunch 0.000000 16.00000 16.000000 2 Cheerios 1.600000 13.60000 0.800000 1 Cinnamon Toast Crunch 0.000000 17.33333 12.000000 2 Clusters 4.000000 26.00000 14.000000 3 Cocoa Puffs 0.000000 12.00000 13.000000 2 Corn Chex 0.000000 22.00000 3.000000 1 Corn Flakes 1.000000 21.00000 2.000000 1 Corn Pops 1.000000 13.00000 12.000000 2 Count Chocula 0.000000 12.00000 13.000000 2 Cracklin' Oat Bran 8.000000 20.00000 14.000000 3 Crispix 1.000000 21.00000 3.000000 3 Crispy Wheat & Raisins 2.666667 14.66667 13.333333 3 Double Chex 1.333333 24.00000 6.666667 3 Froot Loops 1.000000 11.00000 13.000000 2 Frosted Flakes 1.333333 18.66667 14.666667 1 Frosted Mini-Wheats 3.750000 17.50000 8.750000 2 Fruit & Fibre: Dates Walnuts and Oats 7.462687 17.91045 14.925373 3 Fruitful Bran 7.462687 20.89552 17.910448 3 Fruity Pebbles 0.000000 17.33333 16.000000 2 Golden Crisp 0.000000 12.50000 17.045455 1 Golden Grahams 0.000000 20.00000 12.000000 2 Grape Nuts Flakes 3.409091 17.04545 5.681818 3 Grape-Nuts 12.000000 68.00000 12.000000 3 Great Grains Pecan 9.090909 39.39394 12.121212 3 Honey Graham Ohs 1.000000 12.00000 11.000000 2 Honey Nut Cheerios 2.000000 15.33333 13.333333 1 Honey-comb 0.000000 10.52632 8.270677 1 Just Right Fruit & Nut 2.666667 26.66667 12.000000 3 Kix 0.000000 14.00000 2.000000 2 Life 2.985075 17.91045 8.955224 2 Lucky Charms 0.000000 12.00000 12.000000 2 Mueslix Crispy Blend 4.477612 25.37313 19.402985 3 Multi-Grain Cheerios 2.000000 15.00000 6.000000 1 Nut&Honey Crunch 0.000000 22.38806 13.432836 2 Nutri-Grain Almond-Raisin 4.477612 31.34328 10.447761 3 Oatmeal Raisin Crisp 3.000000 27.00000 20.000000 3 Post Nat. Raisin Bran 8.955224 16.41791 20.895522 3 Product 19 1.000000 20.00000 3.000000 3 Puffed Rice 0.000000 13.00000 0.000000 3 Quaker Oat Squares 4.000000 28.00000 12.000000 3 Raisin Bran 6.666667 18.66667 16.000000 2 Raisin Nut Bran 5.000000 21.00000 16.000000 3 Raisin Squares 4.000000 30.00000 12.000000 3 Rice Chex 0.000000 20.35398 1.769912 1 Rice Krispies 0.000000 22.00000 3.000000 1 Shredded Wheat 'n'Bran 5.970149 28.35821 0.000000 1 Shredded Wheat spoon size 4.477612 29.85075 0.000000 1 Smacks 1.333333 12.00000 20.000000 2 Special K 1.000000 16.00000 3.000000 1 Total Corn Flakes 0.000000 21.00000 3.000000 3 Total Raisin Bran 4.000000 15.00000 14.000000 3 Total Whole Grain 3.000000 16.00000 3.000000 3 Triples 0.000000 28.00000 4.000000 3 Trix 0.000000 13.00000 12.000000 2 Wheat Chex 4.477612 25.37313 4.477612 1 Wheaties 3.000000 17.00000 3.000000 1 Wheaties Honey Gold 1.333333 21.33333 10.666667 1 potassium 100% Bran 848.48485 All-Bran 969.69697 All-Bran with Extra Fiber 660.00000 Apple Cinnamon Cheerios 93.33333 Apple Jacks 30.00000 Basic 4 133.33333 Bran Chex 186.56716 Bran Flakes 283.58209 Cap'n'Crunch 46.66667 Cheerios 84.00000 Cinnamon Toast Crunch 60.00000 Clusters 210.00000 Cocoa Puffs 55.00000 Corn Chex 25.00000 Corn Flakes 35.00000 Corn Pops 20.00000 Count Chocula 65.00000 Cracklin' Oat Bran 320.00000 Crispix 30.00000 Crispy Wheat & Raisins 160.00000 Double Chex 106.66667 Froot Loops 30.00000 Frosted Flakes 33.33333 Frosted Mini-Wheats 125.00000 Fruit & Fibre: Dates Walnuts and Oats 298.50746 Fruitful Bran 283.58209 Fruity Pebbles 33.33333 Golden Crisp 45.45455 Golden Grahams 60.00000 Grape Nuts Flakes 96.59091 Grape-Nuts 360.00000 Great Grains Pecan 303.03030 Honey Graham Ohs 45.00000 Honey Nut Cheerios 120.00000 Honey-comb 26.31579 Just Right Fruit & Nut 126.66667 Kix 26.66667 Life 141.79104 Lucky Charms 55.00000 Mueslix Crispy Blend 238.80597 Multi-Grain Cheerios 90.00000 Nut&Honey Crunch 59.70149 Nutri-Grain Almond-Raisin 194.02985 Oatmeal Raisin Crisp 240.00000 Post Nat. Raisin Bran 388.05970 Product 19 45.00000 Puffed Rice 15.00000 Quaker Oat Squares 220.00000 Raisin Bran 320.00000 Raisin Nut Bran 280.00000 Raisin Squares 220.00000 Rice Chex 26.54867 Rice Krispies 35.00000 Shredded Wheat 'n'Bran 208.95522 Shredded Wheat spoon size 179.10448 Smacks 53.33333 Special K 55.00000 Total Corn Flakes 35.00000 Total Raisin Bran 230.00000 Total Whole Grain 110.00000 Triples 80.00000 Trix 25.00000 Wheat Chex 171.64179 Wheaties 110.00000 Wheaties Honey Gold 80.00000 Data Summary: summary(UScereal) mfr calories protein fat sodium G:22 Min. : 50.0 Min. : 0.7519 Min. :0.000 Min. : 0.0 K:21 1st Qu.:110.0 1st Qu.: 2.0000 1st Qu.:0.000 1st Qu.:180.0 N: 3 Median :134.3 Median : 3.0000 Median :1.000 Median :232.0 P: 9 Mean :149.4 Mean : 3.6837 Mean :1.423 Mean :237.8 Q: 5 3rd Qu.:179.1 3rd Qu.: 4.4776 3rd Qu.:2.000 3rd Qu.:290.0 R: 5 Max. :440.0 Max. :12.1212 Max. :9.091 Max. :787.9 fibre carbo sugars shelf Min. : 0.000 Min. :10.53 Min. : 0.00 Min. :1.000 1st Qu.: 0.000 1st Qu.:15.00 1st Qu.: 4.00 1st Qu.:1.000 Median : 2.000 Median :18.67 Median :12.00 Median :2.000 Mean : 3.871 Mean :19.97 Mean :10.05 Mean :2.169 3rd Qu.: 4.478 3rd Qu.:22.39 3rd Qu.:14.00 3rd Qu.:3.000 Max. :30.303 Max. :68.00 Max. :20.90 Max. :3.000 potassium vitamins Min. : 15.0 100% : 5 1st Qu.: 45.0 enriched:57 Median : 96.6 none : 3 Mean :159.1 3rd Qu.:220.0 Max. :969.7 Appendix B: Regression 1 Scatter Plot Regression 1 Summary: Coefficients: (Intercept) cereal9$fat 117.60 22.36 plot(cereal9$fat,cereal9$calories) > fit<-lm(cereal9$calories~cereal9$fat) > abline(fit) Correlation: calories vs. fat cor(cereal9$fat,cereal9$calories) [1] 0.5901757 > summary(fit) Call: lm(formula = cereal9$calories ~ cereal9$fat) Residuals: Min 1Q Median 3Q Max -67.60 -29.96 -7.04 16.73 322.40 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 117.599 8.350 14.084 < 2e-16 *** cereal9$fat 22.361 3.854 5.803 2.29e-07 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 50.78 on 63 degrees of freedom Multiple R-squared: 0.3483, Adjusted R-squared: 0.338 F-statistic: 33.67 on 1 and 63 DF, p-value: 2.292e-07 CI: 2.5 % 97.5 % (Intercept) 100.91250 134.28519 cereal9$fat 14.66031 30.06174> Appendix C: Regression 2 Scatter Plot Regression 2 Summary: Call: lm(formula = cereal9$calories ~ cereal9$carbo) Residuals: Min 1Q Median 3Q Max -72.529 -27.725 1.093 19.468 101.306 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 33.3401 12.3658 2.696 0.00899 ** cereal9$carbo 5.8128 0.5708 10.183 6.13e-15 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 38.67 on 63 degrees of freedom Multiple R-squared: 0.6221, Adjusted R-squared: 0.6161 F-statistic: 103.7 on 1 and 63 DF, p-value: 6.132e-15 > confint(fit) 2.5 % 97.5 % (Intercept) 8.628982 58.051275 cereal9$carbo 4.672149 6.953486 Appendix D: Correlation Matrix of Cereal9 Calories calories 1.0000000 protein 0.7060105 fat 0.5901757 sodium 0.5286552 fibre 0.3882179 carbo 0.7887227 sugars 0.4952942 shelf 0.4263400 potassium 0.4765955 Protein 0.7060105 1.0000000 0.4112661 0.5727222 0.8096397 0.5470903 0.1848484 0.3963311 0.8417540 Fat 0.5901757 0.4112661 1.0000000 0.2595606 0.2260715 0.1828522 0.4156740 0.3256975 0.3232754 Sodium 0.5286552 0.5727222 0.2595606 1.0000000 0.4954831 0.4235617 0.2112437 0.2341275 0.5566426 Fibre 0.3882179 0.8096397 0.2260715 0.4954831 1.0000000 0.2030749 0.1489158 0.3578429 0.9638662 Carbo Sugars Shelf 0.78872268 0.49529421 0.4263400 0.54709029 0.18484845 0.3963311 0.18285220 0.41567397 0.3256975 0.42356172 0.21124365 0.2341275 0.20307489 0.14891577 0.3578429 1.00000000 -0.04082599 0.2604599 -0.04082599 1.00000000 0.2900511 0.26045989 0.29005112 1.0000000 0.24204848 0.27183347 0.4262529 Potassium 0.4765955 0.8417540 0.3232754 0.5566426 0.9638662 0.2420485 0.2718335 0.4262529 1.0000000 Appendix E: Covariate Matrix for Cereal9 calories protein fat sodium fibre carbo sugars shelf potassium calories 3895.24210 116.4428496 60.6743831 4310.04119 148.608725 416.865952 180.380317 22.3463574 5362.72353 protein 116.44285 6.9834322 1.7902522 197.70613 13.122839 12.243296 2.850421 0.8795813 401.04019 fat 60.67438 1.7902522 2.7133993 55.85182 2.284043 2.550715 3.995474 0.4505621 96.00585 sodium 4310.04119 197.7061302 55.8518171 17064.09843 396.983157 468.557877 161.021552 25.6848758 13109.50747 fibre 148.60872 13.1228392 2.2840426 396.98316 37.618644 10.547819 5.329678 1.8432207 1065.82659 carbo 416.86595 12.2432956 2.5507148 468.55788 10.547819 71.714955 -2.017438 1.8523759 369.55191 sugars 180.38032 2.8504205 3.9954744 161.02155 5.329678 -2.017438 34.050018 1.4214010 285.97616 shelf 22.34636 0.8795813 0.4505621 25.68488 1.843221 1.852376 1.421401 0.7052885 64.53852 potassium 5362.72353 401.0401866 96.0058537 13109.50747 1065.826587 369.551907 285.976158 64.5385203 32503.97330 Appendix F: Data Summary for Cereal9 9 Variables 65 Observations -------------------------------------------------------------------------------calories n missing unique Mean .05 .10 .25 .50 .75 .90 65 0 28 149.4 89.87 100.00 110.00 134.33 179.10 212.12 .95 235.04 lowest : 50.00 73.33 82.71 88.00 97.35 highest: 220.00 238.81 260.00 363.64 440.00 -------------------------------------------------------------------------------protein n missing unique Mean .05 .10 .25 .50 .75 .90 65 0 20 3.684 1.000 1.000 2.000 3.000 4.478 6.000 .95 8.873 lowest : 0.7519 0.8850 1.0000 1.3333 2.0000 highest: 6.0000 8.0000 9.0909 12.0000 12.1212 -------------------------------------------------------------------------------fat n missing unique Mean .05 .10 .25 .50 .75 .90 65 0 14 1.423 0.00 0.00 0.00 1.00 2.00 3.03 .95 4.00 0 0.6666667 1 1.1363636 1.3333333 1.4925373 1.6 2 2.6666667 Frequency 22 1 10 1 9 4 12 3 % 34 2 15 2 14 6 23 5 2.9850746 3.030303 4 6 9.0909091 Frequency 4 241 1 % 6 362 2 -------------------------------------------------------------------------------sodium n missing unique Mean .05 .10 .25 .50 .75 .90 65 0 .95 370.31 41 237.8 0.00 91.33 180.00 232.00 290.00 337.33 lowest : 0.00 51.14 90.00 93.33 125.00 highest: 358.21 373.33 393.94 680.00 787.88 -------------------------------------------------------------------------------fibre n missing unique Mean .05 .10 .25 .50 .75 .90 65 0 23 3.871 0.000 0.000 0.000 2.000 4.478 7.785 .95 11.418 lowest : 0.000 1.000 1.333 1.600 2.000 highest: 9.091 12.000 27.273 28.000 30.303 -------------------------------------------------------------------------------carbo n missing unique Mean .05 .10 .25 .50 .75 .90 65 0 40 19.97 12.00 12.00 15.00 18.67 22.39 28.00 .95 29.97 lowest : 10.53 11.00 12.00 12.50 13.00, highest: 29.85 30.00 31.34 39.39 68.00 -------------------------------------------------------------------------------sugars n missing unique Mean .05 .10 .25 .50 .75 .90 65 0 33 10.05 0.16 2.00 4.00 12.00 14.00 16.63 .95 19.16 lowest : 0.00 0.80 1.77 2.00 3.00, highest: 17.91 18.18 19.40 20.00 20.90 -------------------------------------------------------------------------------shelf n missing unique Mean 65 0 3 2.169 1 (18, 28%), 2 (18, 28%), 3 (29, 45%) -------------------------------------------------------------------------------potassium n missing unique Mean .05 .10 .25 .50 .75 .90 65 0 50 159.1 25.26 28.00 45.00 96.59 220.00 313.21 .95 382.45 lowest : 15.00 20.00 25.00 26.32 26.55 highest: 360.00 388.06 660.00 848.48 969.70 -------------------------------------------------------------------------------- Appendix G: Cereal9 Density Plot Appendix H: Gpairs Plot Appendix I: Pairs Plot Appendix J: Overlay Plot Appendix K: Histograms for Cereal9 Appendix L: Loess Data Number of Observations: 65 Equivalent Number of Parameters: 2.98 Residual Standard Error: 38.7 Trace of smoother matrix: 3.01 Control settings: normalize: TRUE span : 5 degree : 2 family : gaussian surface : interpolate cell = 0.2 predict(loessPrediction,cereal9$carbo) [1] 123.2949 154.0816 127.4876 117.6665 103.3392 168.8920 160.2793 144.6882 [9] 127.4876 115.7281 134.1537 179.7651 108.0608 158.2261 152.9713 112.8367 [17] 108.0608 147.7690 152.9713 120.9164 168.8920 103.3392 140.9144 134.9936 [25] 137.0684 152.4253 134.1537 110.4420 147.7690 132.7064 453.0708 257.7993 [33] 108.0608 124.1900 101.1216 183.4352 117.6665 137.0684 108.0608 176.3349 [41] 122.5502 160.2793 209.8177 185.2788 129.5668 147.7690 112.8367 190.8437 [49] 140.9144 152.9713 202.1262 149.6045 158.2261 192.8495 201.2773 108.0608 [57] 127.4876 152.9713 122.5502 127.4876 190.8437 112.8367 176.3349 132.4783 [65] 154.7171 Appendix M: Loess Demo Plot Appendix N: Loess Prediction Data [1] 88.826310 58.039598 -27.487560 29.000133 6.660814 4.441299 [7] -25.950985 -10.359811 32.512440 -27.728116 25.846305 40.234893 [13] 1.939193 -48.226111 -52.971342 -2.836659 1.939193 72.231026 [19] -42.971342 12.416944 -35.558701 6.660814 5.752309 -9.993633 [25] 42.036082 26.679136 12.512975 3.194393 -1.102304 -19.070032 [31] -13.070753 105.837102 11.939193 22.476622 -18.414864 3.231432 [37] -44.333207 12.185332 1.939193 62.471056 -22.550238 18.825135 [43] -0.862432 74.721180 49.537727 -47.768974 -62.836659 9.156288 [49] 19.085639 47.028658 -22.126219 -52.259374 -48.226111 -58.521163 [55] -66.948897 38.605863 -17.487560 -42.971342 17.449762 -27.487560 [61] -44.177042 -2.836659 -27.081184 -32.478297 -8.050432 Appendix O: 3D Loess Prediction Plot (Using *) Appendix P: 3D Loess Prediction Plot (Using +) Appendix Q: R Code 1) a) library(MASS) cereal9 = UScereal[2:10] library(Hmisc) describe(cereal9) apply(cereal9,2,summary) b) datadensity(cereal9) par(mfrow=c(3,3)) i=1; while(i<10){ hist(cereal9[,i]) i=i+1 } c) cor(cereal9) cov(cereal9) pairs(cereal9) library(car) scatterplotMatrix(cereal9) library(YaleToolkit) gpairs(cereal9) 2) a) i) #First Linear Regression fit.calories.fat<-lm(calories~fat,cereal9) fit.calories.fat r.calories.fat<-cor(cereal9$calories,cereal9$fat) r.calories.fat r2.calories.fat<-r.calories.fat^2 r2.calories.fat summary(fit.calories.fat) confint(fit.calories.fat) #Second Linear Regression fit.calories.carbo<-lm(calories~carbo,cereal9) fit.calories.carbo r.calories.carbo<-cor(cereal9$calories,cereal9$carbo) r.calories.carbo r2.calories.carbo<-r.calories.carbo^2 r2.calories.carbo summary(fit.calories.carbo) confint(fit.calories.carbo) ii) plot(cereal9$fat,cereal9$calories) abline(fit.calories.fat) plot(cereal9$carbo,cereal9$calories) abline(fit.calories.carbo) 3) loessObject<-loess(cereal9$calories~cereal9$fat,cereal9, span =0.65) summary(loessObject) loessPrediction<-predict(loessObject) scatter.smooth(cereal9$fat,loessPrediction, span=0.65) errorsOfPredictions<-cereal9$calories-loessPrediction errorsOfPredictions plot(cereal9$calories,errorsOfPredictions) plot(loessPrediction,errorsOfPredictions) c) loessTwoPredictors<-loess(cereal9$calories~cereal9$fat+cereal9$carbo,cereal9, span =0.65) You might want to ask how can we answer this question " How much does the second variable add to the first?" I tried summary(loessTwoPredictors) but it didn't give much information! Maybe looking at scatter3d plot will help scatter3d(calories~fat+carbo,cereal9)