Regression Assignment Catherine Dumas Dima Kassab Anthony

advertisement
Regression Assignment
Catherine Dumas
Dima Kassab
Anthony Leonardi
EPSY 887
10 March, 2011
Regressions were run on cereal data obtained in R. The raw data is shown in Appendix
A. Column 1, the manufacturers, and column 11, vitamins, were removed before analysis was
performed. This data set was called “cereal9.” The summary of this data set is included in
Appendix A for completeness, but will not be discussed further.
Appendix B shows the scatter plot, including the regression line, for the first simple
linear regression performed. The predictor (independent) variable was fat, and the criterion
(dependent) variable was calories. From this graph, the regression summary was obtained,
which appears below the graph in Appendix B. From this information, the regression equation
is:
Y (calories) = 117.60 + 22.36Fat + 3.854
This equation tells us, first, that if the fat content is zero, the caloric content will be 117.60.
Adding Fat to the equation accounts for 22.36% of the variability. Thus, more factors should be
added to the equation (multiple regression). The error term, 3.854, shows the spread of the data,
and appears reasonable. Adding more predictors may decrease this value as more of the
variability is accounted for.
The scatter plot in Appendix B shows a moderate spread around the regression line (best
fit). It also shows one outlier. The t(63) statistic (5.803) is significant at p < 0.001 as is the
F(1,63) statistic (33.67). These both tell us that Fat is a significant predictor of calories. The
95% Confidence Interval (14.66031, 30.06174) does not pass through zero, and thus also shows
the result to be significant. It shows that, with 95% confidence, mean values should fall between
these two values. Also, according to Cumming and Maillardet (2006), replications of this
experiment will capture 83.4% of future means.
R2 for this regression is 0.3483, with an adjusted R2 of 0.338. This value indicates that
about 35% of the variation in calories can be explained by fat. If we want to compare this
regression with other simple regressions with this criterion variable, we need to look at the
adjusted R2 (0.338). The adjustment lowers the value of R2 as more predictors are added. In this
case, we see that about 34% of the variability in calories is explained by fat content. One other
value is important here: The correlation of fat with calories is 0.59, which is fairly high.
The second regression, using carbohydrates as the predictor (independent) variable and
keeping calories as the criterion (dependent) variable. Appendix C shows the scatter plot,
including the regression line, for the second simple linear regression. As before, the regression
summary was obtained from the graph, and it appears below the graph in Appendix C. From this
information, the regression equation is:
Y (calories) = 33.3401 + 5.8128(Carbohydrates) + 0.5708
This equation tells us, first, that if the carbohydrate content is zero, the caloric content will be
33.3401. Adding Carbohydrate to the equation accounts for 5.81% of the variability. Thus,
more factors should be added to the equation (multiple regression). The error term, 0.5708,
shows the spread of the data, and appears fairly small. Adding more predictors may decrease
this value as more of the variability is accounted for.
The scatter plot in Appendix B shows a small to moderate spread around the regression
line (best fit). It also shows two possible outliers. The t(63) statistic (10.183) is significant at p
< 0.001 as is the F(1,63) statistic (103.7). These both tell us that Carbohydrate is a significant
predictor of calories. The 95% Confidence Interval (4.67, 6.95) does not pass through zero, and
thus also shows the result to be significant. It shows that, with 95% confidence, values should
fall between these two values. Also, according to Cumming and Maillardet (2006), replications
of this experiment will capture 83.4% of future means as previously stated.
R2 for this second regression is 0.6221, with an adjusted R2 of 0.6161. This value
indicates that about 62% of the variation in calories can be explained by carbohydrates. The
adjusted R2 for this regression is 0.6161, or about 62%. If we compare the two R2 values, we see
that more of the variability in calories is explained by carbohydrates than by fat.
The correlation matrix (Appendix D) shows the correlations of all 9 variables. Most of
the correlations are small, but the correlation of protein with calories is large (0.7) as are
carbohydrates with calories (0.78), potassium with protein (0.84), and fibre with protein (0.80).
Some are more moderate: fat with calories (0.59), sodium with calories (0.52), sodium with
protein (0.57), and others. The rest are fairly small, 0.2-0.4.
The covariance matrix (Appendix E) illustrates how the values covary with each other.
Some are quite large (13,109 for sodium and potassium, for example), while some are less than 1
(0.45 for shelf and fat).
Appendix F shows the data summary for the data set used (cereal9).
Appendix G shows the data density plot. What we found interesting about this plot was
the clumping of the data for most of the variables. It appears that sugars vary widely in cereals!
The next three Appendices show data density plots. In Appendix H, the Gpairs plot is
shown. This shows data density by pairs. Thus, calories vs. protein is shown in the first square
next to “calories” in the top row. Note the grouping of most of the data, except in some of the
“sugars” panes.
Appendix I shows another plot of the pairs, but this time the points are larger and easier
to see at a quick glance.
Appendix J illustrates a data overlay plot. The density data is overlaid with the loess
data, described below. This shows variability in the data points with the loess data for each pair.
The histograms (Appendix K) are another method used to visualize the data. We were
able to generate all 9 graphs in one Figure, which allows us to see all distributions at once. You
can see that most of the data is clumped toward the y-axis (positively skewed). However, as
before, the sugar plot (#7) immediately jumps out as having a different distribution (bimodal?).
Appendix L shows the Loess data, with the demo plot illustrated in Appendix M. The
predictive capability of this plot is evident. Appendix N shows the Loess prediction data, with
two 3d plots in the next two Appendices. Appendix O shows the plot calculated as a product
(using *), where Appendix P is additive (using +). We don’t see much difference between the
two plots.
Appendix Q lists the R code used in this paper.
Conclusions
Regression is a useful technique for prediction. It allows estimation of the variability in
one or more criterion variables when using one or multiple predictor variables. Also, the
adjusted R2 gives an estimation of how much variability is accounted for by the predictors used,
so more predictors can be added and another regression run in an attempt to increase this value
and lower the error term in the regression equation. Additionally, using R allows many different
types of graphical analyses which other programs may not offer.
Reference
Cumming, G., & Maillardet, R. (2006). Confidence intervals and replication: Where will the
next mean fall? Psychological Methods, 11(3), 217-227.
Appendix A: Raw Data
calories protein
fat sodium
100% Bran
212.12121 12.1212121 3.0303030 393.93939
All-Bran
212.12121 12.1212121 3.0303030 787.87879
All-Bran with Extra Fiber
100.00000 8.0000000 0.0000000 280.00000
Apple Cinnamon Cheerios
146.66667 2.6666667 2.6666667 240.00000
Apple Jacks
110.00000 2.0000000 0.0000000 125.00000
Basic 4
173.33333 4.0000000 2.6666667 280.00000
Bran Chex
134.32836 2.9850746 1.4925373 298.50746
Bran Flakes
134.32836 4.4776119 0.0000000 313.43284
Cap'n'Crunch
160.00000 1.3333333 2.6666667 293.33333
Cheerios
88.00000 4.8000000 1.6000000 232.00000
Cinnamon Toast Crunch
160.00000 1.3333333 4.0000000 280.00000
Clusters
220.00000 6.0000000 4.0000000 280.00000
Cocoa Puffs
110.00000 1.0000000 1.0000000 180.00000
Corn Chex
110.00000 2.0000000 0.0000000 280.00000
Corn Flakes
100.00000 2.0000000 0.0000000 290.00000
Corn Pops
110.00000 1.0000000 0.0000000 90.00000
Count Chocula
110.00000 1.0000000 1.0000000 180.00000
Cracklin' Oat Bran
220.00000 6.0000000 6.0000000 280.00000
Crispix
110.00000 2.0000000 0.0000000 220.00000
Crispy Wheat & Raisins
133.33333 2.6666667 1.3333333 186.66667
Double Chex
133.33333 2.6666667 0.0000000 253.33333
Froot Loops
110.00000 2.0000000 1.0000000 125.00000
Frosted Flakes
146.66667 1.3333333 0.0000000 266.66667
Frosted Mini-Wheats
125.00000 3.7500000 0.0000000 0.00000
Fruit & Fibre: Dates Walnuts and Oats 179.10448 4.4776119 2.9850746 238.80597
Fruitful Bran
179.10448 4.4776119 0.0000000 358.20896
Fruity Pebbles
146.66667 1.3333333 1.3333333 180.00000
Golden Crisp
113.63636 2.2727273 0.0000000 51.13636
Golden Grahams
146.66667 1.3333333 1.3333333 373.33333
Grape Nuts Flakes
113.63636 3.4090909 1.1363636 159.09091
Grape-Nuts
440.00000 12.0000000 0.0000000 680.00000
Great Grains Pecan
363.63636 9.0909091 9.0909091 227.27273
Honey Graham Ohs
120.00000 1.0000000 2.0000000 220.00000
Honey Nut Cheerios
146.66667 4.0000000 1.3333333 333.33333
Honey-comb
82.70677 0.7518797 0.0000000 135.33835
Just Right Fruit & Nut
186.66667 4.0000000 1.3333333 226.66667
Kix
73.33333 1.3333333 0.6666667 173.33333
Life
149.25373 5.9701493 2.9850746 223.88060
Lucky Charms
110.00000 2.0000000 1.0000000 180.00000
Mueslix Crispy Blend
238.80597 4.4776119 2.9850746 223.88060
Multi-Grain Cheerios
100.00000 2.0000000 1.0000000 220.00000
Nut&Honey Crunch
179.10448 2.9850746 1.4925373 283.58209
Nutri-Grain Almond-Raisin
208.95522 4.4776119 2.9850746 328.35821
Oatmeal Raisin Crisp
260.00000 6.0000000 4.0000000 340.00000
Post Nat. Raisin Bran
179.10448 4.4776119 1.4925373 298.50746
Product 19
100.00000 3.0000000 0.0000000 320.00000
Puffed Rice
50.00000 1.0000000 0.0000000 0.00000
Quaker Oat Squares
200.00000 8.0000000 2.0000000 270.00000
Raisin Bran
160.00000 4.0000000 1.3333333 280.00000
Raisin Nut Bran
200.00000 6.0000000 4.0000000 280.00000
Raisin Squares
180.00000 4.0000000 0.0000000 0.00000
Rice Chex
97.34513 0.8849558 0.0000000 212.38938
Rice Krispies
110.00000 2.0000000 0.0000000 290.00000
Shredded Wheat 'n'Bran
134.32836 4.4776119 0.0000000 0.00000
Shredded Wheat spoon size
134.32836 4.4776119 0.0000000 0.00000
Smacks
146.66667 2.6666667 1.3333333 93.33333
Special K
110.00000 6.0000000 0.0000000 230.00000
Total Corn Flakes
110.00000 2.0000000 1.0000000 200.00000
Total Raisin Bran
140.00000 3.0000000 1.0000000 190.00000
Total Whole Grain
100.00000 3.0000000 1.0000000 200.00000
Triples
146.66667 2.6666667 1.3333333 333.33333
Trix
110.00000 1.0000000 1.0000000 140.00000
Wheat Chex
149.25373 4.4776119 1.4925373 343.28358
Wheaties
100.00000 3.0000000 1.0000000 200.00000
Wheaties Honey Gold
146.66667 2.6666667 1.3333333 266.66667
fibre carbo sugars shelf
100% Bran
30.303030 15.15152 18.181818 3
All-Bran
27.272727 21.21212 15.151515 3
All-Bran with Extra Fiber
28.000000 16.00000 0.000000 3
Apple Cinnamon Cheerios
2.000000 14.00000 13.333333 1
Apple Jacks
1.000000 11.00000 14.000000 2
Basic 4
2.666667 24.00000 10.666667 3
Bran Chex
5.970149 22.38806 8.955224 1
Bran Flakes
7.462687 19.40299 7.462687 3
Cap'n'Crunch
0.000000 16.00000 16.000000 2
Cheerios
1.600000 13.60000 0.800000 1
Cinnamon Toast Crunch
0.000000 17.33333 12.000000 2
Clusters
4.000000 26.00000 14.000000 3
Cocoa Puffs
0.000000 12.00000 13.000000 2
Corn Chex
0.000000 22.00000 3.000000 1
Corn Flakes
1.000000 21.00000 2.000000 1
Corn Pops
1.000000 13.00000 12.000000 2
Count Chocula
0.000000 12.00000 13.000000 2
Cracklin' Oat Bran
8.000000 20.00000 14.000000 3
Crispix
1.000000 21.00000 3.000000 3
Crispy Wheat & Raisins
2.666667 14.66667 13.333333 3
Double Chex
1.333333 24.00000 6.666667 3
Froot Loops
1.000000 11.00000 13.000000 2
Frosted Flakes
1.333333 18.66667 14.666667 1
Frosted Mini-Wheats
3.750000 17.50000 8.750000 2
Fruit & Fibre: Dates Walnuts and Oats 7.462687 17.91045 14.925373 3
Fruitful Bran
7.462687 20.89552 17.910448 3
Fruity Pebbles
0.000000 17.33333 16.000000 2
Golden Crisp
0.000000 12.50000 17.045455 1
Golden Grahams
0.000000 20.00000 12.000000 2
Grape Nuts Flakes
3.409091 17.04545 5.681818 3
Grape-Nuts
12.000000 68.00000 12.000000 3
Great Grains Pecan
9.090909 39.39394 12.121212 3
Honey Graham Ohs
1.000000 12.00000 11.000000 2
Honey Nut Cheerios
2.000000 15.33333 13.333333 1
Honey-comb
0.000000 10.52632 8.270677 1
Just Right Fruit & Nut
2.666667 26.66667 12.000000 3
Kix
0.000000 14.00000 2.000000 2
Life
2.985075 17.91045 8.955224 2
Lucky Charms
0.000000 12.00000 12.000000 2
Mueslix Crispy Blend
4.477612 25.37313 19.402985 3
Multi-Grain Cheerios
2.000000 15.00000 6.000000 1
Nut&Honey Crunch
0.000000 22.38806 13.432836 2
Nutri-Grain Almond-Raisin
4.477612 31.34328 10.447761 3
Oatmeal Raisin Crisp
3.000000 27.00000 20.000000 3
Post Nat. Raisin Bran
8.955224 16.41791 20.895522 3
Product 19
1.000000 20.00000 3.000000 3
Puffed Rice
0.000000 13.00000 0.000000 3
Quaker Oat Squares
4.000000 28.00000 12.000000 3
Raisin Bran
6.666667 18.66667 16.000000 2
Raisin Nut Bran
5.000000 21.00000 16.000000 3
Raisin Squares
4.000000 30.00000 12.000000 3
Rice Chex
0.000000 20.35398 1.769912 1
Rice Krispies
0.000000 22.00000 3.000000 1
Shredded Wheat 'n'Bran
5.970149 28.35821 0.000000 1
Shredded Wheat spoon size
4.477612 29.85075 0.000000 1
Smacks
1.333333 12.00000 20.000000 2
Special K
1.000000 16.00000 3.000000 1
Total Corn Flakes
0.000000 21.00000 3.000000 3
Total Raisin Bran
4.000000 15.00000 14.000000 3
Total Whole Grain
3.000000 16.00000 3.000000 3
Triples
0.000000 28.00000 4.000000 3
Trix
0.000000 13.00000 12.000000 2
Wheat Chex
4.477612 25.37313 4.477612 1
Wheaties
3.000000 17.00000 3.000000 1
Wheaties Honey Gold
1.333333 21.33333 10.666667 1
potassium
100% Bran
848.48485
All-Bran
969.69697
All-Bran with Extra Fiber
660.00000
Apple Cinnamon Cheerios
93.33333
Apple Jacks
30.00000
Basic 4
133.33333
Bran Chex
186.56716
Bran Flakes
283.58209
Cap'n'Crunch
46.66667
Cheerios
84.00000
Cinnamon Toast Crunch
60.00000
Clusters
210.00000
Cocoa Puffs
55.00000
Corn Chex
25.00000
Corn Flakes
35.00000
Corn Pops
20.00000
Count Chocula
65.00000
Cracklin' Oat Bran
320.00000
Crispix
30.00000
Crispy Wheat & Raisins
160.00000
Double Chex
106.66667
Froot Loops
30.00000
Frosted Flakes
33.33333
Frosted Mini-Wheats
125.00000
Fruit & Fibre: Dates Walnuts and Oats 298.50746
Fruitful Bran
283.58209
Fruity Pebbles
33.33333
Golden Crisp
45.45455
Golden Grahams
60.00000
Grape Nuts Flakes
96.59091
Grape-Nuts
360.00000
Great Grains Pecan
303.03030
Honey Graham Ohs
45.00000
Honey Nut Cheerios
120.00000
Honey-comb
26.31579
Just Right Fruit & Nut
126.66667
Kix
26.66667
Life
141.79104
Lucky Charms
55.00000
Mueslix Crispy Blend
238.80597
Multi-Grain Cheerios
90.00000
Nut&Honey Crunch
59.70149
Nutri-Grain Almond-Raisin
194.02985
Oatmeal Raisin Crisp
240.00000
Post Nat. Raisin Bran
388.05970
Product 19
45.00000
Puffed Rice
15.00000
Quaker Oat Squares
220.00000
Raisin Bran
320.00000
Raisin Nut Bran
280.00000
Raisin Squares
220.00000
Rice Chex
26.54867
Rice Krispies
35.00000
Shredded Wheat 'n'Bran
208.95522
Shredded Wheat spoon size
179.10448
Smacks
53.33333
Special K
55.00000
Total Corn Flakes
35.00000
Total Raisin Bran
230.00000
Total Whole Grain
110.00000
Triples
80.00000
Trix
25.00000
Wheat Chex
171.64179
Wheaties
110.00000
Wheaties Honey Gold
80.00000
Data Summary:
summary(UScereal)
mfr
calories
protein
fat
sodium
G:22 Min. : 50.0 Min. : 0.7519 Min. :0.000 Min. : 0.0
K:21 1st Qu.:110.0 1st Qu.: 2.0000 1st Qu.:0.000 1st Qu.:180.0
N: 3 Median :134.3 Median : 3.0000 Median :1.000 Median :232.0
P: 9 Mean :149.4 Mean : 3.6837 Mean :1.423 Mean :237.8
Q: 5 3rd Qu.:179.1 3rd Qu.: 4.4776 3rd Qu.:2.000 3rd Qu.:290.0
R: 5 Max. :440.0 Max. :12.1212 Max. :9.091 Max. :787.9
fibre
carbo
sugars
shelf
Min. : 0.000 Min. :10.53 Min. : 0.00 Min. :1.000
1st Qu.: 0.000 1st Qu.:15.00 1st Qu.: 4.00 1st Qu.:1.000
Median : 2.000 Median :18.67 Median :12.00 Median :2.000
Mean : 3.871 Mean :19.97 Mean :10.05 Mean :2.169
3rd Qu.: 4.478 3rd Qu.:22.39 3rd Qu.:14.00 3rd Qu.:3.000
Max. :30.303 Max. :68.00 Max. :20.90 Max. :3.000
potassium
vitamins
Min. : 15.0 100% : 5
1st Qu.: 45.0 enriched:57
Median : 96.6 none : 3
Mean :159.1
3rd Qu.:220.0
Max. :969.7
Appendix B: Regression 1 Scatter Plot
Regression 1 Summary:
Coefficients:
(Intercept) cereal9$fat
117.60
22.36
plot(cereal9$fat,cereal9$calories)
> fit<-lm(cereal9$calories~cereal9$fat)
> abline(fit)
Correlation: calories vs. fat
cor(cereal9$fat,cereal9$calories)
[1] 0.5901757
> summary(fit)
Call:
lm(formula = cereal9$calories ~ cereal9$fat)
Residuals:
Min 1Q Median 3Q Max
-67.60 -29.96 -7.04 16.73 322.40
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 117.599 8.350 14.084 < 2e-16 ***
cereal9$fat 22.361 3.854 5.803 2.29e-07 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 50.78 on 63 degrees of freedom
Multiple R-squared: 0.3483, Adjusted R-squared: 0.338
F-statistic: 33.67 on 1 and 63 DF, p-value: 2.292e-07
CI:
2.5 % 97.5 %
(Intercept) 100.91250 134.28519
cereal9$fat 14.66031 30.06174>
Appendix C: Regression 2 Scatter Plot
Regression 2 Summary:
Call:
lm(formula = cereal9$calories ~ cereal9$carbo)
Residuals:
Min 1Q Median 3Q Max
-72.529 -27.725 1.093 19.468 101.306
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.3401 12.3658 2.696 0.00899 **
cereal9$carbo 5.8128 0.5708 10.183 6.13e-15 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 38.67 on 63 degrees of freedom
Multiple R-squared: 0.6221, Adjusted R-squared: 0.6161
F-statistic: 103.7 on 1 and 63 DF, p-value: 6.132e-15
> confint(fit)
2.5 % 97.5 %
(Intercept) 8.628982 58.051275
cereal9$carbo 4.672149 6.953486
Appendix D: Correlation Matrix of Cereal9
Calories
calories 1.0000000
protein 0.7060105
fat
0.5901757
sodium 0.5286552
fibre
0.3882179
carbo
0.7887227
sugars
0.4952942
shelf
0.4263400
potassium 0.4765955
Protein
0.7060105
1.0000000
0.4112661
0.5727222
0.8096397
0.5470903
0.1848484
0.3963311
0.8417540
Fat
0.5901757
0.4112661
1.0000000
0.2595606
0.2260715
0.1828522
0.4156740
0.3256975
0.3232754
Sodium
0.5286552
0.5727222
0.2595606
1.0000000
0.4954831
0.4235617
0.2112437
0.2341275
0.5566426
Fibre
0.3882179
0.8096397
0.2260715
0.4954831
1.0000000
0.2030749
0.1489158
0.3578429
0.9638662
Carbo
Sugars
Shelf
0.78872268 0.49529421 0.4263400
0.54709029 0.18484845 0.3963311
0.18285220 0.41567397 0.3256975
0.42356172 0.21124365 0.2341275
0.20307489 0.14891577 0.3578429
1.00000000 -0.04082599 0.2604599
-0.04082599 1.00000000 0.2900511
0.26045989 0.29005112 1.0000000
0.24204848 0.27183347 0.4262529
Potassium
0.4765955
0.8417540
0.3232754
0.5566426
0.9638662
0.2420485
0.2718335
0.4262529
1.0000000
Appendix E: Covariate Matrix for Cereal9
calories
protein
fat
sodium
fibre
carbo
sugars
shelf
potassium
calories
3895.24210 116.4428496 60.6743831 4310.04119 148.608725 416.865952 180.380317 22.3463574 5362.72353
protein
116.44285
6.9834322 1.7902522 197.70613
13.122839
12.243296
2.850421 0.8795813
401.04019
fat
60.67438
1.7902522 2.7133993
55.85182
2.284043
2.550715
3.995474 0.4505621
96.00585
sodium 4310.04119 197.7061302 55.8518171 17064.09843
396.983157 468.557877 161.021552 25.6848758 13109.50747
fibre
148.60872 13.1228392 2.2840426
396.98316
37.618644
10.547819
5.329678
1.8432207 1065.82659
carbo
416.86595 12.2432956 2.5507148
468.55788
10.547819
71.714955
-2.017438
1.8523759
369.55191
sugars
180.38032
2.8504205 3.9954744
161.02155
5.329678
-2.017438
34.050018
1.4214010
285.97616
shelf
22.34636
0.8795813 0.4505621
25.68488
1.843221
1.852376
1.421401
0.7052885
64.53852
potassium 5362.72353 401.0401866 96.0058537 13109.50747 1065.826587 369.551907 285.976158 64.5385203 32503.97330
Appendix F: Data Summary for Cereal9
9 Variables 65 Observations
-------------------------------------------------------------------------------calories
n missing unique Mean .05 .10 .25 .50 .75 .90
65
0 28 149.4 89.87 100.00 110.00 134.33 179.10 212.12
.95
235.04
lowest : 50.00 73.33 82.71 88.00 97.35
highest: 220.00 238.81 260.00 363.64 440.00
-------------------------------------------------------------------------------protein
n missing unique Mean .05 .10 .25 .50 .75 .90
65
0 20 3.684 1.000 1.000 2.000 3.000 4.478 6.000
.95
8.873
lowest : 0.7519 0.8850 1.0000 1.3333 2.0000
highest: 6.0000 8.0000 9.0909 12.0000 12.1212
-------------------------------------------------------------------------------fat
n missing unique Mean .05 .10 .25 .50 .75 .90
65
0 14 1.423 0.00 0.00 0.00 1.00 2.00 3.03
.95
4.00
0 0.6666667 1 1.1363636 1.3333333 1.4925373 1.6 2 2.6666667
Frequency 22
1 10
1
9
4 12
3
%
34
2 15
2
14
6 23
5
2.9850746 3.030303 4 6 9.0909091
Frequency
4
241
1
%
6
362
2
-------------------------------------------------------------------------------sodium
n missing unique Mean .05 .10 .25 .50 .75 .90
65
0
.95
370.31
41 237.8 0.00 91.33 180.00 232.00 290.00 337.33
lowest : 0.00 51.14 90.00 93.33 125.00
highest: 358.21 373.33 393.94 680.00 787.88
-------------------------------------------------------------------------------fibre
n missing unique Mean .05 .10 .25 .50 .75 .90
65
0 23 3.871 0.000 0.000 0.000 2.000 4.478 7.785
.95
11.418
lowest : 0.000 1.000 1.333 1.600 2.000
highest: 9.091 12.000 27.273 28.000 30.303
-------------------------------------------------------------------------------carbo
n missing unique Mean .05 .10 .25 .50 .75 .90
65
0 40 19.97 12.00 12.00 15.00 18.67 22.39 28.00
.95
29.97
lowest : 10.53 11.00 12.00 12.50 13.00, highest: 29.85 30.00 31.34 39.39 68.00
-------------------------------------------------------------------------------sugars
n missing unique Mean .05 .10 .25 .50 .75 .90
65
0 33 10.05 0.16 2.00 4.00 12.00 14.00 16.63
.95
19.16
lowest : 0.00 0.80 1.77 2.00 3.00, highest: 17.91 18.18 19.40 20.00 20.90
-------------------------------------------------------------------------------shelf
n missing unique Mean
65
0
3 2.169
1 (18, 28%), 2 (18, 28%), 3 (29, 45%)
-------------------------------------------------------------------------------potassium
n missing unique Mean .05 .10 .25 .50 .75 .90
65
0 50 159.1 25.26 28.00 45.00 96.59 220.00 313.21
.95
382.45
lowest : 15.00 20.00 25.00 26.32 26.55
highest: 360.00 388.06 660.00 848.48 969.70
--------------------------------------------------------------------------------
Appendix G: Cereal9 Density Plot
Appendix H: Gpairs Plot
Appendix I: Pairs Plot
Appendix J: Overlay Plot
Appendix K: Histograms for Cereal9
Appendix L: Loess Data
Number of Observations: 65
Equivalent Number of Parameters: 2.98
Residual Standard Error: 38.7
Trace of smoother matrix: 3.01
Control settings:
normalize: TRUE
span : 5
degree : 2
family : gaussian
surface : interpolate
cell = 0.2
predict(loessPrediction,cereal9$carbo)
[1] 123.2949 154.0816 127.4876 117.6665 103.3392 168.8920 160.2793 144.6882
[9] 127.4876 115.7281 134.1537 179.7651 108.0608 158.2261 152.9713 112.8367
[17] 108.0608 147.7690 152.9713 120.9164 168.8920 103.3392 140.9144 134.9936
[25] 137.0684 152.4253 134.1537 110.4420 147.7690 132.7064 453.0708 257.7993
[33] 108.0608 124.1900 101.1216 183.4352 117.6665 137.0684 108.0608 176.3349
[41] 122.5502 160.2793 209.8177 185.2788 129.5668 147.7690 112.8367 190.8437
[49] 140.9144 152.9713 202.1262 149.6045 158.2261 192.8495 201.2773 108.0608
[57] 127.4876 152.9713 122.5502 127.4876 190.8437 112.8367 176.3349 132.4783
[65] 154.7171
Appendix M: Loess Demo Plot
Appendix N: Loess Prediction Data
[1] 88.826310 58.039598 -27.487560 29.000133 6.660814 4.441299
[7] -25.950985 -10.359811 32.512440 -27.728116 25.846305 40.234893
[13] 1.939193 -48.226111 -52.971342 -2.836659 1.939193 72.231026
[19] -42.971342 12.416944 -35.558701 6.660814 5.752309 -9.993633
[25] 42.036082 26.679136 12.512975 3.194393 -1.102304 -19.070032
[31] -13.070753 105.837102 11.939193 22.476622 -18.414864 3.231432
[37] -44.333207 12.185332 1.939193 62.471056 -22.550238 18.825135
[43] -0.862432 74.721180 49.537727 -47.768974 -62.836659 9.156288
[49] 19.085639 47.028658 -22.126219 -52.259374 -48.226111 -58.521163
[55] -66.948897 38.605863 -17.487560 -42.971342 17.449762 -27.487560
[61] -44.177042 -2.836659 -27.081184 -32.478297 -8.050432
Appendix O: 3D Loess Prediction Plot (Using *)
Appendix P: 3D Loess Prediction Plot (Using +)
Appendix Q: R Code
1)
a)
library(MASS)
cereal9 = UScereal[2:10]
library(Hmisc)
describe(cereal9)
apply(cereal9,2,summary)
b)
datadensity(cereal9)
par(mfrow=c(3,3))
i=1;
while(i<10){
hist(cereal9[,i])
i=i+1
}
c)
cor(cereal9)
cov(cereal9)
pairs(cereal9)
library(car)
scatterplotMatrix(cereal9)
library(YaleToolkit)
gpairs(cereal9)
2)
a)
i)
#First Linear Regression
fit.calories.fat<-lm(calories~fat,cereal9)
fit.calories.fat
r.calories.fat<-cor(cereal9$calories,cereal9$fat)
r.calories.fat
r2.calories.fat<-r.calories.fat^2
r2.calories.fat
summary(fit.calories.fat)
confint(fit.calories.fat)
#Second Linear Regression
fit.calories.carbo<-lm(calories~carbo,cereal9)
fit.calories.carbo
r.calories.carbo<-cor(cereal9$calories,cereal9$carbo)
r.calories.carbo
r2.calories.carbo<-r.calories.carbo^2
r2.calories.carbo
summary(fit.calories.carbo)
confint(fit.calories.carbo)
ii)
plot(cereal9$fat,cereal9$calories)
abline(fit.calories.fat)
plot(cereal9$carbo,cereal9$calories)
abline(fit.calories.carbo)
3)
loessObject<-loess(cereal9$calories~cereal9$fat,cereal9, span =0.65)
summary(loessObject)
loessPrediction<-predict(loessObject)
scatter.smooth(cereal9$fat,loessPrediction, span=0.65)
errorsOfPredictions<-cereal9$calories-loessPrediction
errorsOfPredictions
plot(cereal9$calories,errorsOfPredictions)
plot(loessPrediction,errorsOfPredictions)
c)
loessTwoPredictors<-loess(cereal9$calories~cereal9$fat+cereal9$carbo,cereal9, span =0.65)
You might want to ask how can we answer this question " How much does the second variable add to
the first?"
I tried summary(loessTwoPredictors) but it didn't give much information! Maybe looking at scatter3d
plot will help
scatter3d(calories~fat+carbo,cereal9)
Download