Ice Cream Consumption Introduction: Ice cream is one of the major frozen desserts in the market. Many people enjoy it while watching their favor television programs or after dinner. Yet, what are the key factors that affect the people consuming ice cream? paper we will investigate the potential factors. In this The data were collected from March 18, 1951 to July 11, 1953, total of 30 four-week periods. The potential variables are: price of the ice cream (Price), weekly family income of the consumers (Income), and the temperature (Temp). Methodology: Since we are interested what are the key factors that affect the ice cream consumption, we will conduct certain tests to identify and examine every potential variable. In view of the fact that the data were collected over time, we will conduct a time-plot to reveal the relationship between ice cream consumption and time. the outlier samples. Then, we will use the box-plot to identify Next, we will use backward selection method to remove the irrelevant variable. Finally, we will run the regression model and so, we can obtain the model to estimate the ice cream consumption. Analysis: Before conducting any statistical testing, we found the ice cream consumption data is a time-series data. The samples were collected every 4 weeks for 30 consecutive trials. As a result, we have to conduct a time-plot to examine the presence of any patterns over the observation period. At the same time, we will conduct the regression model of ice cream consumption (IC) vs. Date. Figure 1: Regression Model of IC vs. Date IC = 0.3337 +0.0017 0.55 Date N 30 Rsq 0.0492 AdjRsq 0.0152 0.50 RMSE 0.0653 0.45 0.40 0.35 0.30 0.25 0 5 10 15 20 25 30 Date By observation, the time-plot displays a noticeable pattern. samples are bouncing up and down in time. The We are positive it was affected by the seasonal factor because the ice cream consumption was higher during summer and lower in the winter. Additionally, the regression line shows an increasing trend over the observation period. As a result, we believe the time-series factor ‘Date’ may explain some of the change in the means of ice cream consumption. Since the time-series factor takes affect to the model, we will sort out the outliers by ‘Year’. Hence, we will conduct 3 separate box-plot graph as of Income vs. Year, Price vs. Year and Temp vs. Year. Box-plot 1 (Income vs. Year): Box-plot 2 (Price vs. Year): 0.30 100 95 0.29 90 I n c o m e P r i c e 0.28 85 0.27 80 0.26 75 1 2 3 1 2 3 Year Year Box-plot 3 (Temp vs. Year): 80 70 60 T e m p 50 40 30 20 1 2 3 Year According to the box-plot graphs at above, there are no outliers among all 3 potential variables. Therefore, we will move on to the backward selection method to screen out the needless variable. The purpose of this study is in an attempt to determine the main factors for ice cream consumption. Therefore, we will run “backward selection” to eliminate the ineffective variables among ‘Price’, ‘Income’, and ‘Temp’ which in order to obtain a better model of ice cream consumption. In addition, Backward Elimination: Step 0 All Variables Entered: R-Square = 0.7190 and C(p) = 4.0000 Source DF Model Error Corrected Total 3 26 29 Analysis of Variance Sum of Squares 0.09025 0.03527 0.12552 Mean Square F Value Pr > F 22.17 <.0001 0.03008 0.00136 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept Price Income Temp 0.19732 -1.04441 0.00331 0.00346 0.27022 0.83436 0.00117 0.00044555 0.00072338 0.00213 0.01082 0.08174 0.53 1.57 7.97 60.25 0.4718 0.2218 0.0090 <.0001 Bounds on condition number: 1.1444, 9.9727 ------------------------------------------------------------------------------------------------Backward Elimination: Step 1 Variable Price Removed: R-Square = 0.7021 and C(p) = 3.5669 Analysis of Variance Source DF Sum of Squares Mean Square Model Error Corrected Total 2 27 29 0.08812 0.03740 0.12552 0.04406 0.00139 F Value Pr > F 31.81 <.0001 Variable Parameter Estimate Standard Error Type II SS F Value Pr > F Intercept Income Temp -0.11320 0.00353 0.00354 0.10828 0.00117 0.00044496 0.00151 0.01261 0.08784 1.09 9.10 63.41 0.3051 0.0055 <.0001 Bounds on condition number: 1.1179, 4.4715 ------------------------------------------------------------------------------------------------All variables left in the model are significant at the 0.0500 level. Summary of Backward Elimination Step 1 Variable Removed Price Number Vars In 2 Partial R-Square Model R-Square 0.0169 0.7021 C(p) 3.5669 F Value Pr > F 1.57 0.2218 By the result of the backward selection, the variable ‘Price’ has been removed. The R2 was dropped to 0.7021, from 0.7190. Additionally, the F-value was increased to 31.81, from 22.17. These information states the newest model has less variability and higher significant value. Therefore, it yields a better model to predict the ice cream consumption. Predicting Ice Cream Consumption by Temperature and Income The REG Procedure Dependent Variable: IC Analysis of Variance Source DF Sum of Squares Mean Square Model Error Corrected Total 2 27 29 0.08812 0.03740 0.12552 0.04406 0.00139 Root MSE Dependent Mean Coeff Var 0.03722 0.35943 10.35446 R-Square Adj R-Sq F Value Pr > F 31.81 <.0001 0.7021 0.6800 Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > |t| Intercept Temp Income 1 1 1 -0.11320 0.00354 0.00353 0.10828 0.00044496 0.00117 -1.05 7.96 3.02 0.3051 <.0001 0.0055 According to the regression procedure at above, we obtain the following regression model to estimate the Ice Cream consumption: IC = -0.11320 + 0.00354 * Temp + 0.00353 * Income The following figure is the residual plot for the regression model. We can observe a noticeable pattern that it has an increasing trend. Figure 2: Residual plot for IC by Temp and Income IC = -0.1132 +0.0035 0.100 Temp +0.0035 Income N 30 Rsq 0.7021 AdjRsq 0.6800 RMSE 0.0372 0.075 0.050 0.025 0.000 -0.025 -0.050 -0.075 -3 -2 -1 0 1 2 3 Normal Quantile In the regression model, we notice the variable ‘Temp’ is depending on the change of season. In order to provide concrete evidence that the time-series factor ‘Date’ contribute an increasing characteristic to the ice cream consumption, we will conduct the regression model of IC vs. Temp for each Year. Afterward, we will superimpose these 3 regression lines on the same plot, and hopefully this overlay plot may express some hidden facts behind the time-series factor. Figure 3: Overlay regression lines of IC vs. Temp by Year 0.55 0.54 0.53 0.52 0.51 0.50 0.49 0.48 0.47 0.46 0.45 0.44 0.43 0.42 0.41 0.40 0.39 0.38 0.37 0.36 0.35 0.34 0.33 0.32 0.31 0.30 0.29 0.28 0.27 0.26 0.25 3 2 2 3 1 3 1 1 3 2 1 2 1 2 3 1 2 2 3 1 2 2 3 PLOT 2 1 IC 1 2 IC 2 3 IC 3 2 2 FIT 1 2 Year 1951 Year 1952 Year 1953 1 1 20 30 40 50 Temp1 60 70 80 Conclusion: Figure 3 expresses a clear fact that if the weather is searing and hot, then the ice cream consumption will increase; and on freezing and chilly days, the ice cream consumption will be less. Moreover, the ice cream consumption increased year after year since March 18, 1951, at least for the 3 years of the study as indicated by figure 3. Although our sample is relatively small, we believe this is not coincident because the ice cream industry is prosperity in this day and age. For instance, Dreyer's Grand Ice Cream Company has new ice cream flavor every year and had more than a billion dollars in annual revenue (Reference 2). As a result, the history did verify our testing result. Summary: The main factor that influences the Ice Cream consumption is the temperature. We may expect that the selling of ice cream is higher in the summer and lower in the winter. The demand for ice cream increases every year. Reference: 1. The Data and Story Library, Cornell University, NY http://lib.stat.cmu.edu/DASL/Datafiles/IceCream.html 2. Dreyer’s Grand Ice Cream Holdings, Inc. http://www.dreyersinc.com/about/index.asp Appendix 1 (Codebook): 1. Date: Time period (1-30) of the study (from 3/18/51 to 7/11/53) 2. IC: Ice cream consumption in pints per capita 3. Price: Price of ice cream per pint in dollars 4. Income: Weekly family income in dollars 5. Temp: Mean temperature in degrees Fahrenheit (o F) 6. Year: Year within the study (0 = 1951, 1 = 1952, 2 = 1953) Appendix 2 (Data): Obs Date IC Price Income Temp 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 0.386 0.374 0.393 0.425 0.406 0.344 0.327 0.288 0.269 0.256 0.286 0.298 0.329 0.318 0.381 0.381 0.470 0.443 0.386 0.342 0.319 0.307 0.284 0.326 0.309 0.359 0.376 0.416 0.437 0.548 0.270 0.282 0.277 0.280 0.272 0.262 0.275 0.267 0.265 0.277 0.282 0.270 0.272 0.287 0.277 0.287 0.280 0.277 0.277 0.277 0.292 0.287 0.277 0.285 0.282 0.265 0.265 0.265 0.268 0.260 78 79 81 80 76 78 82 79 76 79 82 85 86 83 84 82 80 78 84 86 85 87 94 92 95 96 94 96 91 90 41 56 63 68 69 65 61 47 32 24 28 26 32 40 55 63 72 72 67 60 44 40 32 27 28 33 41 52 64 71 Appendix 3 (SAS code): data ice_cream; Input Date IC Price Datalines; 1 .386 .270 78 2 .374 .282 79 3 .393 .277 81 4 .425 .280 80 5 .406 .272 76 6 .344 .262 78 7 .327 .275 82 8 .288 .267 79 9 .269 .265 76 10 .256 .277 79 11 .286 .282 82 12 .298 .270 85 13 .329 .272 86 14 .318 .287 83 15 .381 .277 84 16 .381 .287 82 17 .470 .280 80 18 .443 .277 78 19 .386 .277 84 20 .342 .277 86 21 .319 .292 85 22 .307 .287 87 23 .284 .277 94 24 .326 .285 92 25 .309 .282 95 26 .359 .265 96 27 .376 .265 94 28 .416 .265 96 29 .437 .268 91 30 .548 .260 90 ; Income Temp Year; 41 56 63 68 69 65 61 47 32 24 28 26 32 40 55 63 72 72 67 60 44 40 32 27 28 33 41 52 64 71 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 Year 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 proc print data = ice_cream; title 'Data for Ice Cream Consumption'; run;quit; proc boxplot; title 'Boxplot for Income vs. Year'; plot Income*Year; run; proc boxplot; title 'Boxplot for Price vs. Year'; plot Price*Year; run; proc boxplot; title 'Boxplot for Temp vs. Year'; plot Temp*Year; run; proc reg data = ice_cream; title 'Regression Model of IC vs. Date'; model IC = Date; plot IC * Date; run;quit; proc reg data = ice_cream; model IC = Price Income Temp / selection = backward sls = .05 cp mse; run;quit; proc reg data = ice_cream; title 'Predicting Ice Cream Consumption by Temperature and Income'; model IC = Temp Income; run; title 'Residual plot for IC by Temp and Income'; plot residual.*nqq.; run;quit; proc sort data=ice_cream; by year; proc reg data=ice_cream; by year; title 'Predicting Ice Cream Consumption from Temperature by Year'; model IC = Temp; output out=resids p=Fitted_IC; proc print data=resids; run; quit; proc sort data=resids; by year Temp; data resids; set resids; if year=1 then do; IC1=IC; Temp1=Temp; Fit1=Fitted_IC; end; if year=2 then do; IC2=IC; Temp2=Temp; Fit2=Fitted_IC; end; if year=3 then do; IC3=IC; Temp3=Temp; Fit3=Fitted_IC; end; proc sort data=resids; by year Temp; proc print data=resids; run;quit; symbol1 cv=red symbol2 cv=blue symbol3 cv=black symbol4 cv=red symbol5 cv=blue symbol6 cv=black value='1' i=none; value='2' i=none; value='3' i=none; value=none i=join ci=red line=1; value=none i=join ci=blue line=2; value=none i=join ci=black line=3; proc gplot data=resids; title 'Overlay regression lines of IC vs. Temp by Year'; plot IC1*Temp1=1 IC2*Temp2=2 IC3*Temp3=3 Fit1*temp1=4 fit2*temp2=5 fit3*temp3=6/overlay legend; run;quit;