Predicting Ice Cream Consumption by Temperature and Income

advertisement
Ice Cream Consumption
Introduction: Ice cream is one of the major frozen desserts in
the market.
Many people enjoy it while watching their favor
television programs or after dinner.
Yet, what are the key
factors that affect the people consuming ice cream?
paper we will investigate the potential factors.
In this
The data were
collected from March 18, 1951 to July 11, 1953, total of 30
four-week periods.
The potential variables are: price of the ice
cream (Price), weekly family income of the consumers (Income),
and the temperature (Temp).
Methodology: Since we are interested what are the key factors
that affect the ice cream consumption, we will conduct certain
tests to identify and examine every potential variable.
In view
of the fact that the data were collected over time, we will
conduct a time-plot to reveal the relationship between ice cream
consumption and time.
the outlier samples.
Then, we will use the box-plot to identify
Next, we will use backward selection method
to remove the irrelevant variable.
Finally, we will run the
regression model and so, we can obtain the model to estimate the
ice cream consumption.
Analysis: Before conducting any statistical testing, we found
the ice cream consumption data is a time-series data.
The
samples were collected every 4 weeks for 30 consecutive trials.
As a result, we have to conduct a time-plot to examine the
presence of any patterns over the observation period.
At the
same time, we will conduct the regression model of ice cream
consumption (IC) vs. Date.
Figure 1:
Regression Model of IC vs. Date
IC = 0.3337 +0.0017
0.55
Date
N
30
Rsq
0.0492
AdjRsq
0.0152
0.50
RMSE
0.0653
0.45
0.40
0.35
0.30
0.25
0
5
10
15
20
25
30
Date
By observation, the time-plot displays a noticeable pattern.
samples are bouncing up and down in time.
The
We are positive it was
affected by the seasonal factor because the ice cream
consumption was higher during summer and lower in the winter.
Additionally, the regression line shows an increasing trend over
the observation period.
As a result, we believe the time-series
factor ‘Date’ may explain some of the change in the means of ice
cream consumption.
Since the time-series factor takes affect to the model, we will
sort out the outliers by ‘Year’.
Hence, we will conduct 3
separate box-plot graph as of Income vs. Year, Price vs. Year
and Temp vs. Year.
Box-plot 1 (Income vs. Year):
Box-plot 2 (Price vs. Year):
0.30
100
95
0.29
90
I
n
c
o
m
e
P
r
i
c
e
0.28
85
0.27
80
0.26
75
1
2
3
1
2
3
Year
Year
Box-plot 3 (Temp vs. Year):
80
70
60
T
e
m
p
50
40
30
20
1
2
3
Year
According to the box-plot graphs at above, there are no outliers
among all 3 potential variables.
Therefore, we will move on to
the backward selection method to screen out the needless
variable.
The purpose of this study is in an attempt to determine the main
factors for ice cream consumption.
Therefore, we will run
“backward selection” to eliminate the ineffective variables
among ‘Price’, ‘Income’, and ‘Temp’ which in order to obtain a
better model of ice cream consumption.
In addition,
Backward Elimination: Step 0
All Variables Entered: R-Square = 0.7190 and C(p) = 4.0000
Source
DF
Model
Error
Corrected Total
3
26
29
Analysis of Variance
Sum of
Squares
0.09025
0.03527
0.12552
Mean
Square
F Value
Pr > F
22.17
<.0001
0.03008
0.00136
Variable
Parameter
Estimate
Standard
Error
Type II SS
F Value
Pr > F
Intercept
Price
Income
Temp
0.19732
-1.04441
0.00331
0.00346
0.27022
0.83436
0.00117
0.00044555
0.00072338
0.00213
0.01082
0.08174
0.53
1.57
7.97
60.25
0.4718
0.2218
0.0090
<.0001
Bounds on condition number: 1.1444, 9.9727
------------------------------------------------------------------------------------------------Backward Elimination: Step 1
Variable Price Removed: R-Square = 0.7021 and C(p) = 3.5669
Analysis of Variance
Source
DF
Sum of
Squares
Mean
Square
Model
Error
Corrected Total
2
27
29
0.08812
0.03740
0.12552
0.04406
0.00139
F Value
Pr > F
31.81
<.0001
Variable
Parameter
Estimate
Standard
Error
Type II SS
F Value
Pr > F
Intercept
Income
Temp
-0.11320
0.00353
0.00354
0.10828
0.00117
0.00044496
0.00151
0.01261
0.08784
1.09
9.10
63.41
0.3051
0.0055
<.0001
Bounds on condition number: 1.1179, 4.4715
------------------------------------------------------------------------------------------------All variables left in the model are significant at the 0.0500 level.
Summary of Backward Elimination
Step
1
Variable
Removed
Price
Number
Vars In
2
Partial
R-Square
Model
R-Square
0.0169
0.7021
C(p)
3.5669
F Value
Pr > F
1.57
0.2218
By the result of the backward selection, the variable ‘Price’
has been removed.
The R2 was dropped to 0.7021, from 0.7190.
Additionally, the F-value was increased to 31.81, from 22.17.
These information states the newest model has less variability
and higher significant value.
Therefore, it yields a better
model to predict the ice cream consumption.
Predicting Ice Cream Consumption by Temperature and Income
The REG Procedure
Dependent Variable: IC
Analysis of Variance
Source
DF
Sum of
Squares
Mean
Square
Model
Error
Corrected Total
2
27
29
0.08812
0.03740
0.12552
0.04406
0.00139
Root MSE
Dependent Mean
Coeff Var
0.03722
0.35943
10.35446
R-Square
Adj R-Sq
F Value
Pr > F
31.81
<.0001
0.7021
0.6800
Parameter Estimates
Variable
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
Intercept
Temp
Income
1
1
1
-0.11320
0.00354
0.00353
0.10828
0.00044496
0.00117
-1.05
7.96
3.02
0.3051
<.0001
0.0055
According to the regression procedure at above, we obtain the
following regression model to estimate the Ice Cream consumption:
IC = -0.11320 + 0.00354 * Temp + 0.00353 * Income
The following figure is the residual plot for the regression
model. We can observe a noticeable pattern that it has an
increasing trend.
Figure 2:
Residual plot for IC by Temp and Income
IC = -0.1132 +0.0035
0.100
Temp +0.0035
Income
N
30
Rsq
0.7021
AdjRsq
0.6800
RMSE
0.0372
0.075
0.050
0.025
0.000
-0.025
-0.050
-0.075
-3
-2
-1
0
1
2
3
Normal Quantile
In the regression model, we notice the variable ‘Temp’ is
depending on the change of season.
In order to provide concrete
evidence that the time-series factor ‘Date’ contribute an
increasing characteristic to the ice cream consumption, we will
conduct the regression model of IC vs. Temp for each Year.
Afterward, we will superimpose these 3 regression lines on the
same plot, and hopefully this overlay plot may express some
hidden facts behind the time-series factor.
Figure 3:
Overlay regression lines of IC vs. Temp by Year
0.55
0.54
0.53
0.52
0.51
0.50
0.49
0.48
0.47
0.46
0.45
0.44
0.43
0.42
0.41
0.40
0.39
0.38
0.37
0.36
0.35
0.34
0.33
0.32
0.31
0.30
0.29
0.28
0.27
0.26
0.25
3
2
2
3
1
3
1
1
3
2
1
2
1
2
3
1
2
2
3
1
2
2
3
PLOT
2
1 IC 1
2 IC 2
3 IC 3
2
2
FIT
1
2
Year 1951
Year 1952
Year 1953
1
1
20
30
40
50
Temp1
60
70
80
Conclusion: Figure 3 expresses a clear fact that if the weather
is searing and hot, then the ice cream consumption will increase;
and on freezing and chilly days, the ice cream consumption will
be less.
Moreover, the ice cream consumption increased year
after year since March 18, 1951, at least for the 3 years of the
study as indicated by figure 3.
Although our sample is
relatively small, we believe this is not coincident because the
ice cream industry is prosperity in this day and age.
For
instance, Dreyer's Grand Ice Cream Company has new ice cream
flavor every year and had more than a billion dollars in annual
revenue (Reference 2).
As a result, the history did verify our
testing result.
Summary: The main factor that influences the Ice Cream
consumption is the temperature.
We may expect that the selling
of ice cream is higher in the summer and lower in the winter.
The demand for ice cream increases every year.
Reference:
1. The Data and Story Library, Cornell University, NY
http://lib.stat.cmu.edu/DASL/Datafiles/IceCream.html
2. Dreyer’s Grand Ice Cream Holdings, Inc.
http://www.dreyersinc.com/about/index.asp
Appendix 1 (Codebook):
1. Date: Time period (1-30) of the study (from 3/18/51 to 7/11/53)
2. IC: Ice cream consumption in pints per capita
3. Price: Price of ice cream per pint in dollars
4. Income: Weekly family income in dollars
5. Temp: Mean temperature in degrees Fahrenheit (o F)
6. Year: Year within the study (0 = 1951, 1 = 1952, 2 = 1953)
Appendix 2 (Data):
Obs
Date
IC
Price
Income
Temp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
0.386
0.374
0.393
0.425
0.406
0.344
0.327
0.288
0.269
0.256
0.286
0.298
0.329
0.318
0.381
0.381
0.470
0.443
0.386
0.342
0.319
0.307
0.284
0.326
0.309
0.359
0.376
0.416
0.437
0.548
0.270
0.282
0.277
0.280
0.272
0.262
0.275
0.267
0.265
0.277
0.282
0.270
0.272
0.287
0.277
0.287
0.280
0.277
0.277
0.277
0.292
0.287
0.277
0.285
0.282
0.265
0.265
0.265
0.268
0.260
78
79
81
80
76
78
82
79
76
79
82
85
86
83
84
82
80
78
84
86
85
87
94
92
95
96
94
96
91
90
41
56
63
68
69
65
61
47
32
24
28
26
32
40
55
63
72
72
67
60
44
40
32
27
28
33
41
52
64
71
Appendix 3 (SAS code):
data ice_cream;
Input Date IC Price
Datalines;
1
.386 .270 78
2
.374 .282 79
3
.393 .277 81
4
.425 .280 80
5
.406 .272 76
6
.344 .262 78
7
.327 .275 82
8
.288 .267 79
9
.269 .265 76
10
.256 .277 79
11
.286 .282 82
12
.298 .270 85
13
.329 .272 86
14
.318 .287 83
15
.381 .277 84
16
.381 .287 82
17
.470 .280 80
18
.443 .277 78
19
.386 .277 84
20
.342 .277 86
21
.319 .292 85
22
.307 .287 87
23
.284 .277 94
24
.326 .285 92
25
.309 .282 95
26
.359 .265 96
27
.376 .265 94
28
.416 .265 96
29
.437 .268 91
30
.548 .260 90
;
Income Temp Year;
41
56
63
68
69
65
61
47
32
24
28
26
32
40
55
63
72
72
67
60
44
40
32
27
28
33
41
52
64
71
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
Year
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
proc print data = ice_cream;
title 'Data for Ice Cream Consumption'; run;quit;
proc boxplot;
title 'Boxplot for Income vs. Year';
plot Income*Year;
run;
proc boxplot;
title 'Boxplot for Price vs. Year';
plot Price*Year;
run;
proc boxplot;
title 'Boxplot for Temp vs. Year';
plot Temp*Year;
run;
proc reg data = ice_cream;
title 'Regression Model of IC vs. Date';
model IC = Date;
plot IC * Date;
run;quit;
proc reg data = ice_cream;
model IC = Price Income Temp / selection = backward sls = .05 cp mse;
run;quit;
proc reg data = ice_cream;
title 'Predicting Ice Cream Consumption by Temperature and Income';
model IC = Temp Income; run;
title 'Residual plot for IC by Temp and Income';
plot residual.*nqq.;
run;quit;
proc sort data=ice_cream; by year;
proc reg data=ice_cream; by year;
title 'Predicting Ice Cream Consumption from Temperature by Year';
model IC = Temp;
output out=resids p=Fitted_IC;
proc print data=resids;
run; quit;
proc sort data=resids; by year Temp;
data resids;
set resids;
if year=1 then do; IC1=IC; Temp1=Temp; Fit1=Fitted_IC; end;
if year=2 then do; IC2=IC; Temp2=Temp; Fit2=Fitted_IC; end;
if year=3 then do; IC3=IC; Temp3=Temp; Fit3=Fitted_IC; end;
proc sort data=resids; by year Temp;
proc print data=resids;
run;quit;
symbol1
cv=red
symbol2
cv=blue
symbol3
cv=black
symbol4
cv=red
symbol5 cv=blue
symbol6 cv=black
value='1' i=none;
value='2' i=none;
value='3' i=none;
value=none i=join ci=red line=1;
value=none i=join ci=blue line=2;
value=none i=join ci=black line=3;
proc gplot data=resids;
title 'Overlay regression lines of IC vs. Temp by Year';
plot IC1*Temp1=1 IC2*Temp2=2 IC3*Temp3=3
Fit1*temp1=4 fit2*temp2=5 fit3*temp3=6/overlay legend;
run;quit;
Download