The Challenges of Forecasting Demand for E-Commerce

The Challenges of Forecasting
Demand for E-Commerce
Walton College of Business, University of Arkansas
Undergraduate Student Name: Arley Bejerano
Phone: (479) 268 0361
e-mail: abejeran@uark.edu
BSBA Supply Chain Management
BSBA Economics
ABSTRACT
E-commerce behaves differently compared to the traditional retailing industry.
Customers’ online purchase experience requires less seller-buyer interaction. Customer
satisfaction depends on factors that are outside the realm of the online selling company. A huge
customer base, seasonal changing patterns, lack of historical data for new products, disruption in
social and behavioral norms, customer shifting buying criteria, and disruptions in demand
patterns from competitors make forecasting a very difficult task. Determining which forecast
model to apply is difficult and the measuring of forecast accuracy is not reliable all the time.
Challenges of Forecasting Demand for E-Commerce
1
E-COMMERCE IN PERSPECTIVE
The e-commerce industry is growing considerably. According to Statista.com, a reliable
Internet statistics online company, the Business to Consumer e-commerce sales is expected to
grow 15.6% worldwide this year and approximately 13% in 2016. The number of online stores is
also growing as improvements in technology change the mobility, data interchange, and
accessibility in today’s agile and dynamic business environment. Existing e-tailersi like
Amazon.com and Alibaba.comii are constantly looking for ways to stay ahead in this race to
achieve and maintain competitive advantage. As a result, the level of competitiveness in this
industry is rapidly changing. Furthermore, one aspect of great concern throughout the entire ecommerce industry is the challenges posed by uncertainty in demand planning and inventory
management. It is true that uncertainty is not a new trend; it has challenged replenishment and
sourcing managers for a long time. The e-commerce industry, on the other hand, is fairly new
and rapidly growing. The old rules that apply to the traditional retailing practices are not as
applicable to the e-tailer business as it could be thought. Therefore, demand is a priority on the
agendas of managers at all levels for online sellers. An important question to answer is, “What
forecasting model could more accurately tackle the uncertainties in this industry.” The major
challenges in forecasting demand for e-commerce include customer base, seasonal changing
patterns, lack of historical data for new products, disruption in social and behavioral norms,
customer shifting buying criteria, and disruptions in demand patterns from competitors (Forrester
Consulting).
The era of digitalization has brought many benefits for buyers and opportunities for
sellers. On the seller’s side, these opportunities do not come without hardships and challenges.
The e-commerce industry has the advantage of great accessibility, almost like an omnipresence
Challenges of Forecasting Demand for E-Commerce
2
that makes the predictive analytics part of today’s modern business environment practically
unbearable. Nonetheless, some companies have managed to come up with very clever strategies
to gain market share in this endless pool of potential customers. CRM systems, Search Engine
Optimizations, Click and Collect, or even Pay-Per-Click marketing strategies are all part of the
predictive analytics activities that translate into competitive advantage and core competencies.
Everywhere there is a computer or a smart phone with Internet accessibility; there is the
possibility of one or several customers. That’s when CRM’s can play an important role. E-tailers
store information on personalized data according to purchasing patterns. It does not matter what
computer an online buyer uses; IP’s are not always taken into consideration - at least not as a key
factor to locate customers – but rather other queues like customer name/last name, address,
including country, credit card information, etc. There is a current trend of telecommunication and
marketing companies integrating with CRM’s in efforts to improve customer service and
increase data collection capabilities. Avaya IP Office is an example of such a company that has
being integrated with CRM’s (avaya.com). Along with personal information, CRM’s keep
records of items bought by every new and repeating customer. This is not enough though;
customers changing their buying patterns are too unpredictable for a company to collect the
information necessary on its customer base, especially on a global scale. Yes, forecasts can be
made based on historical data on regular customers based on these patterns, but what about new
customers? A time series data forecast model lacks the ability to account for exogenous variables
that influence buying decisions in the short run. The answer then, is social networking and media
marketing. (DeMers) How many times do we run into a pop-up window while surfing the net
that reads, “sign in with Google [or] sign in with Facebook?” What do you think that means? Etailers, blogs even news websites like Forbes.com, Quora.com, the big Amazon.com, and
Challenges of Forecasting Demand for E-Commerce
3
Alibaba.com, all use this marketing device to access customers’ information. From a demand
forecasting standpoint what they are doing is merely filling in those independent variables that
are not taken into account for new customers. Information is all around us, and social networking
is a paradise for online sellers (Carroll).
ACCURACY IN FORECASTING
Before going farther in this analysis of forecasting demand in e-commerce, we need to
acknowledge a simple fact, “forecasts are almost always wrong, if not always” (Production and
Inventory Management). This phrase, popular within the econometrician and forecasting expert
community, points out the sad but real paradox that surrounds this necessary business practice.
Two questions arise then: How wrong is a forecast, and what model minimizes the errors
associated with the forecast? The differences in supply chain structure between the traditional
retailing business and its younger brother e-commerce make it much more important for
managers to predict demand as accurate as ‘humanly’ possible. Customer satisfaction levels in
online sales do not depend on the human interaction between a costumer and a sales
representative, or even a cashier and the consumer. The buying experience in online sales
depends more on other factors, like online product availability and short delivery times, among
others. This is truly what customers want in an online purchase experience. All these factors are
dependent on accurate forecast of demand.
Forecasting demand and inventory levels accurately is a challenge. Measures of forecast
accuracy are as important and as useful as the very forecast. As mentioned before, to determine
e-commerce’s future demand, we shouldn’t rely solely on historical data.
A solution to forecasting demand in a more accurate manner would be to integrate
different factors from marketing and qualitative forecasting into a multiple regression analysis.
Challenges of Forecasting Demand for E-Commerce
4
By combining research methods, like data mining, as well as by relying on the forecaster’s
experience, studying economic parameters at the micro, or short-term forecasts, and the macro,
or long-term forecasts, independent variables can be drawn to build a good model. However,
there is only so much a forecaster and an expert can get from these marketing strategies about the
behavior of an irregular customer. Once again, the customer base is too big and the buying
criteria are too spread out across many regions. It is almost impossible to create a model that
encapsulates so many independent variables and outcomes. A solution to this problem would be
to create subgroups according to specific criteria, but by doing so we run into the problem of
multi-collinearityiii (Hanke and Wichern, 297). In addition, a multiple regression forecast with
the characteristics posed by the conditions in the demand for e-commerce could use ‘dummy
variables’ to set the boundaries between qualitative forecast biased and the dependent variable
(Hanke and Wichern, 297-300). In the case of using a multiple regression analysis combined
with qualitative forecasting techniques, the indicators or dummy variables can nullify
coefficients that are not significant to the model (Hanke and Wichern, 293). Nevertheless, as
mentioned before, this is very difficult due to the dependency on qualitative methods of
determining the regressors or independent variables and the very large customer base.
A solution to some of the problems on forecasting demand for e-commerce mentioned
before might be obtained by applying another forecasting model. We have established that time
series data is not the best forecasting technique due to the lack of historical data on new
customers, the seasonal difference across regions and the extremely large customer base issues.
We have also recognized the fact that multiple regression analysis is not a very effective method
of forecasting future e-commerce demand due to an unrealistic dependency on qualitative
selection of independent variables; we will continue to prove this theory. The forecast model that
Challenges of Forecasting Demand for E-Commerce
5
I offer next is a regression with time series data (Hanke and Wichern, 339-367). This is a
combination of both time series and regression analysis. It takes the best of both models to
predict future demand and only leaves us with the problem of autocorrelation. Why is this model
better than the others? Time series models, including the Holt Winters and the exponentially
weighed moving average, do not include the effects of external factors like causal models do.
The opposite applies to causal models; they don’t take historical patterns like seasonality,
cyclicality, trend, and level very seriously. The challenges of forecasting demand for ecommerce apply alternatively to both time series and causal models. However, if we combined
both, we can reduce or pool the risk in a way that minimizes the forecasting error and optimizes
measures of accuracy, like mean absolute percentage error [MAPE], absolute percentage error
[APE], mean square error [MSE] and the correlation coefficient.iv Nevertheless, a model like
this, capable of integrating time series data and regression analysis is sadly going to keep a few
weaknesses from each model. One of such defects is specifically applicable to the regression
analysis element of it, autocorrelation (Hanke and Wichern, 347).
PROBLEMS AND SOLUTIONS TO AUTORRELATION
Autocorrelation brings a series of problems, the first being the omitted variable or model
specification error (Hanke and Wichern, 348). The solution to this challenge would be to
improve the model specification, or simply find the missing variable. This part is not that simple,
because the variable may not be available or it is not quantifiable. We say that it is not
quantifiable when drawing assumptions about relevant regressors from a qualitative standpoint.
We already stated that quantitative forecasts and independent variable selection is a very hard
task when forecasting demand for e-commerce, because of the lack of historical data on new and
prospective customers and all the other factors previously mentioned. The same problem applies
Challenges of Forecasting Demand for E-Commerce
6
to the model specification solution for autocorrelation on customers demand for e-commerce
products. The second problem with autocorrelation in this model is the regression with
differences (Hanke and Wichern, 350). In regression with time series data models, we also have
the possibility of running into a very highly auto-correlated data. A solution to this problem
would be instead of running a regression in terms of the dependent and the independent
variables; we use the differences between the dependent variable at time (t) and itself lagged one
time. This solution also requires using the difference between the predictors Yt and Yt-1, Yt-k. – we
will see how this is not completely a bad circumstance later on (Hanke and Wichern, 350). The
third problem with autocorrelation, or serial correlation, is the possibility of having autocorrelated errors or what is known as generalized differences (Hanke and Wichern, 354). This
condition is present on a regression analysis with time series data when, Yt =ß0+ ß1Xt+εt and εt
= εt-1+vt (Hanke and Wichern, 340).
Yt: Actual demand for period t
ß0: intercept coefficient
ß1: slope coefficient
Xt: regress-or, in this case second series or the variable Yt lagged k number of times.
εt: error at time t for big samples or a population
vt: independent error following a standard normal distribution z~N(0,σ2y) (Hanke and Wichern,
340).
In the case that the error term ui follows a normal distribution that is not dependent on Xi,
the error term is said to be heteroskedastic. This is, the variance of the conditional distribution is
not constant but increases/decreases with every observation Xi. In such circumstance, it becomes
more difficult to conduct a test statistic without mathematically manipulating the error term.
Challenges of Forecasting Demand for E-Commerce
7
According to FIGURE 1, the error term is indeed heteroskedastic and will interfere with the
Durbin-Watson test statistics. We will see why is this a problem later on when testing for
autocorrelation.
Heteroscedasticity
20.00
10.00
0.00
1
2
3
4
5
6
7
8
9
10
11
12
-10.00
-20.00
-30.00
FIGURE 1. Conditional distribution of the error term and Heteroskedasticity
The solution for this problem of generalized differences, in the available data for e-commerce
demand, is to take the correlation between two consecutive errors into the equation, Y’t= ß0 (1-
)+ ß1 X’t + vt, where  is a binomial, or Bernoulli distribution, depicting the correlation
between consecutive errors in e-commerce demand forecast (Hanke and Wichern, 354).
These are the three possible problems with the corresponding solutions for a
regression analysis of times series data. All of them are relatively big challenges to the
forecasting manager when using this model and the solutions although available, are complicated
in nature and sometimes unrealistic. However, when modeling data such as the one available for
e-commerce, it might not be our choice but rather a last resort when all else has failed. In the first
part of this paper we mentioned the difficulties, or rather the impracticality of using a standard
multiple regression analysis on exogenous regressors or independent variables. We have also
established that standard models of time series data like moving averages, exponentially
weighted moving averages, and even the (standard or additive) Holt-Winters Modelv are not
Challenges of Forecasting Demand for E-Commerce
8
feasible for forecasting demand for e-commerce due to factors like very large customer base,
changing seasonal patterns simultaneously across regions, and rapidly changing customer buying
criteria (Hanke and Wichern, 126-136). Therefore, given all these challenges, it is left up to me
to prove that the most feasible model is a regression analysis on time series data. For this, an
important step is to review the test statistics and check for the degree of autocorrelation in the
demand for e-commerce data available, and hope that it passes the Durbin-Watson testvi (Hanke
and Wichern, 344-347). There is one hiccup in this respect though. I apologize for the suspense
up until this moment or hopefully, dear reader, you might have realized by now that we cannot
do a regression analysis on time series data if there is no independent variable or exogenous
regressors associated or predicting the demand for e-commerce. We have already concluded that
the use of qualitative methods to find relevant independent variables or regressors is very
difficult or unrealistic. Therefore, at this point, we are going to rule out the third model,
regression with time series data using regressors. Not all is gloomy news, though. The
description of the problems and solutions to autocorrelation of this last mentioned model has
shed light on a model that might be our last hope in finding a solution to the challenges of
forecasting demand for e-commerce. This final model is called autoregressive model and is built
on the idea that autocorrelation is not too bad after all, and could be used as a predicting factor
for this type of data. Therefore, the autocorrelation showed in Figures 2 and 4, will allow us to
run an efficient autoregressive model that will predict future demand for e-commerce at the
macro level; at least in a more efficient manner compared to the rest of the models explained
here.
Challenges of Forecasting Demand for E-Commerce
9
DURBIN-WATSON HYPOTHESIS TEST (TESTING FOR AUTOCORRELATION)
H0: =0 there is no autocorrelation
H1: >0 there is significant autocorrelation
𝐷𝑊 =
2
∑𝑛
𝑡=2(𝑒𝑡 −𝑒𝑡−1 )
𝑛
2
∑𝑡=1 𝑒𝑡
(Hanke and Wichern, 344-347)
One way to determine the type of autocorrelation is to calculate the Durbin-Watson test
statistics and then find the upper and lower bounds on the DW Test Bounds (Hanke and
Wichern, 344-345).
𝐷𝑊 =
2
∑𝑛
𝑡=2(𝑒𝑡 −𝑒𝑡−1 )
2
∑𝑛
𝑡=1 𝑒𝑡
=1.102539
To make things easier and more understandable we are going to use the already
calculated DW test statistic provided in Figure 2, EViews Regression Analysis on United States
B2C e-commerce sales from 2002 to 2013 (in billions). Then, by looking at the Durbin-Watson
test Bounds Tablevii, we see that the upper bound for [n] sample size of eleven observations, [k]
lag 1 is roughly [dU]= 1.36 and the lower bound [dL]= 1.08.viii There is possibility to have an
inconclusive Durbin-Watson test for autocorrelation. If the DW falls within the lower and upper
bounds, as it does in this case, then we cannot conclude that there is indeed autocorrelation. The
inability to perform a successful DW test statistic is mainly because the forecast exhibits a
heteroskedastic error term. However, we can still test the residual autocorrelation coefficient at
5% significance level. This is our last resort to prove there is autocorrelation. Then, if 𝑟𝑘 (𝑒) =
∑𝑛
𝑡=𝑘+1 𝑒𝑡 𝑒𝑡−𝑘
2
∑𝑛
𝑡=1 𝑒𝑡
= 1.0116 falls within 0 ±
2
√11
= ±.6030; we say there is no autocorrelation. Since
the residual autocorrelation coefficient does not fall within this interval, then we can safely
conclude there is autocorrelatio and can run build our model on the data available for ecommerce
sales from 2002 to 2013 (Hanke and Wichern, 344).
Challenges of Forecasting Demand for E-Commerce
10
FIGURE 2: EViews Regression Analysis on United States B2C e-commerce sales from 2002 to
2013 (in billions)ix
Sales in Billions
Actual vs. Forecast Sales E-commerce
U.S 2002 - 2014
400
350
300
250
200
150
100
50
0
2000
2002
2004
2006
2008
Year
2010
2012
2014
2016
FIGURE 3: Actual vs. Forecast U.S. B2C Sales from 2002 to 2014 with Polynomial order 6
trend-lines.x
Challenges of Forecasting Demand for E-Commerce
11
FIGURE 4: EViews Correlogram showing the autocorrelation between U.S. e-commerce sales
and Lag_1 of the same data.xi
AUTOREGRESSIVE MODEL
Figure 4 shows that the Durbin-Watson Test proves there is indeed an autocorrelation on
the data from 2002 to 2014 of sales in the e-commerce industry in the U.S. We see that the first
bar depicting the autocorrelation between U.S_sales and U.S_sales_lag_1 is significant at the
95% confidence level. Therefore, our final autoregressive model would look like this;
𝑌̂𝑡 = 𝑏0 + 𝑏1 𝑌𝑡−1 = 𝑌̂2014 = 20.62235 + 1.011624𝑌𝑡−1=$346.37 (billions)
Since the error term is assumed to have the same [OLS] Ordinary Least Square principle from a
standard linear regression model, which is 0 or asymptotically close to 0 (Hanke and Wichern,
357).
Challenges of Forecasting Demand for E-Commerce
12
TABLE 1: Actual vs. Forecast U.S. B2C Sales from 2002 to 2014 data, calculations and forecast
accuracy measures.
Autoregressive
Model Forecast
Forecast
-Actual
Abs. Value
Error
Term
93
93.46
-0.46
0.46
0.005
117
114.70
2.30
2.30
0.020
5.27
143
138.98
4.02
4.02
0.028
16.14
2006
171
165.28
5.72
5.72
0.033
32.67
6
2007
200
193.61
6.39
6.39
0.032
40.83
7
2008
214
222.95
-8.95
8.95
0.042
80.05
8
2009
209
237.11
-28.11
28.11
0.134
790.17
9
2010
228
232.05
-4.05
4.05
0.018
16.42
10
2011
256
251.27
4.73
4.73
0.018
22.35
11
2012
289
279.60
9.40
9.40
0.033
88.40
12
2013
322
312.98
9.02
9.02
0.028
81.33
13
2014
322
346.37
-24.37
24.37
0.076
593.67
Period
Year
Sales_ US
1
2002
72
2
2003
3
2004
4
2005
5
MAD=
Abs %
Error
SQRT Error
0.21
107.50
MAPE=
0.47
MSE=
1767.50
FORECAST ACCURACY MEASURES
1
1
̂
Mean absolute deviation= 𝑀𝐴𝐷 = 𝑛 ∑𝑛𝑡=1 |𝑌𝑡 − 𝑌̂𝑡 | = 12 ∑13
𝑡=2 |𝑌𝑡 − 𝑌𝑡 | = 107.50 (Hanke and
Wichern, 82)
1
Mean absolute percentage error = MAPE = 𝑛 ∑𝑛𝑡=1
|𝑌𝑡 −𝑌̂𝑡 |
𝑌𝑡
= 0.47 (Hanke and Wichern, 83)
1
Mean Square Error = MSE = 𝑛 ∑𝑛𝑡=1(𝑌𝑡 − 𝑌̂𝑡 )2 = 1767.50
(Hanke and Wichern, 82)
Then, how accurate is this autoregressive model? The MSE tells how large the error is in
magnitude and since it is squared this measure tends to be very big for larger samples. In this
case, our sample is rather small, so it makes us wonder how good this model is. On the other
hand, the mean absolute percentage error is quite small; less than 1. This is usually good news
and often discredits MAPE’s measures. In my opinion, measures of bias tend to be more
Challenges of Forecasting Demand for E-Commerce
13
important for smaller sample data than measures of magnitude, because we are not accounting
for degrees of freedom like we do in standard linear regression models. However, like I have
mentioned before, forecasts are only as good as the results they yield and the criteria evaluating
the results depend on the data and the forecaster. We have already explained the characteristics
and challenges that the demand for e-commerce represents in terms of data collection and
reliability. Therefore, we expected a rather large forecast accuracy measure in terms of
magnitude and prefer a small biased measure. This is in fact what the three forecast accuracy
measures are telling us about the autoregressive model on demand for e-commerce. Nonetheless,
we cannot definitely assert how good the forecast really is in the long run, because of the shortterm scope of this model. The autoregressive forecasting’s random process that is described and
calculated here will have the same short termed, or micro focused, characteristic of a time series
model that we have been trying to refute.
CLOSING COMMENTS
Finally, I would like to leave this discussion about the challenges that forecasting demand
for e-commerce represents for decision makers in today’s globalized, sparse, and at the same
time interconnected world, with an open ended question: Is there really an accurate way of
predicting demand in a fast paced, evolving, and innovative industry like e-commerce? Accurate,
is rather a broad term in forecasting and probably every replenishment manager would quote or
make up a different definition for it. Coming back, one more time, to the quote at the beginning,
“forecasts are almost always wrong, if not always” (Carroll). But truly, what makes a forecast
right or wrong, less accurate, or closer to the forecaster or decision maker’s expectations are
really the results; what might work for some people, might not work for others. In an ideal world
Challenges of Forecasting Demand for E-Commerce
14
we would be able to find independent variables that predict e-commerce demand in an unbiased
and accurate manner, but that is unlikely today. Yes, companies are getting real breakthroughs in
that area with CRM’s and other marketing strategies but that is not enough to achieve the real
competitive advantage that they are looking for. Amazon looks promisingly close to that target,
but there is a long stretch yet ahead of them. Furthermore, the challenge is even bigger today
with trends like Omni-channel commerce. Disruption in social behavior and customer buying
criteria means more competition as e-tailers fight to achieve higher market share in this
stochastic business model.
Challenges of Forecasting Demand for E-Commerce
15
Works Cited
avaya.com. Avaya Customer Relationship Management (CRM) Integration. 23 Feb 2015
<http://www.avaya.com/usa/documents/avaya_customer_relationship_management_integrationgcc4792-02.pdf>.
Carroll, Matthew. Forecasting Revenue & Expenses for an E-Commerce Startup: Sales Build. 20
Feb 2015 <http://retail-analytics.quora.com/Forecasting-Revenue-Expenses-for-an-ECommerce-Startup-Sales-Build>.
DeMers, Jayson. The Top 10 Benefits Of Social Media Marketing. 11 August 2014. 20 Feb 2015
<http://www.forbes.com/sites/jaysondemers/2014/08/11/the-top-10-benefits-of-social-mediamarketing/>.
Forrester Consulting. Customer Desires Vs. Retailer Capabilities: Minding The Omni- Channel
Commerce Gap. Forrester Research, Inc. accenture.com, January 2014.
Hanke, John E. and Dean W Wichern. Business Forecasting. Ed. Eric Svendsen. Ninth Edition.
Upper Saddle River: Pearson Prentice Hall, n.d.
"Production and Inventory Management." American Production and Inventory Management
27.1-2 (1986): 95.
statista.com. Annual B2C e-commerce sales in the United States from 2002 to 2013 (in billion
U.S. dollars). statista.com. 23 Feb 2015 <http://www.statista.com/statistics/271449/annual-b2ce-commerce-sales-in-the-united-states/>.
Challenges of Forecasting Demand for E-Commerce
16
i I will use this term in occasion and interchangeably for companies solely selling products via electronic transaction or companies practicing
Omni-channel marketing and operation activities, like Walmart Inc. and its division Walmart.com.
ii URL’s to companies mentioned in this paper:
http://www.alibaba.com
http://www.alibaba.com
http://www.amazon.com
iii Multi-collinearity is a situation in which independent variables in a multiple regression model are highly correlated to each other. This will
create a biased forecast towards these inter-correlated variables and underestimate the rest of the regressors.
iv Correlation coefficient measures the strength of the correlation between two, or more variables in multiple regression analysis, of a liner
regression model (37).
v
Alternatives to the Holt-Winters Model like multiplicative components with different variability across the data series (167).
vi
The Durbin-Watson test statistics is used to prove that positive lag_1 autocorrelation does not exist (343-344).
vii
This table can be found in any Business Forecasting textbook. For this paper I used Business Forecasting by John E. Hanke and Dean W.
Wichern; see Works Cited for more information.
viii Since the table only includes bounds for samples equal or greater than 15, I will use the bounds for n=15.
ix
Data source statista.com URL: http://www.statista.com/statistics/271449/annual-b2c-e-commerce-sales-in-the-united-states/
x This chart uses a naïve forecast for actual demand in 2014, for the purpose of forecast measure, same sales volume actual vs. autoregressive
forecast for period 2014.
xi Data source Statistic.com
Challenges of Forecasting Demand for E-Commerce
17