Inventory Estimation From Transactions via Hidden Markov Models Nirav Bhan

Inventory Estimation From Transactions via
Hidden Markov Models
by
Nirav Bhan
B.Tech, Electrical Engineering
Indian Institute of Technology-Bombay, 2013
Submitted to the MIT Department of Electrical Engineering and
Computer Science
in Partial Fulfillment of the Requirements for the Degree
of
Master of Science in Electrical Engineering
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
September 2015
c 2015 Massachusetts Institute of Technology. All rights reserved
Author
Department of Electrical Engineering and Computer Science
August 26, 2015
Certified by
Devavrat Shah
Associate Professor of Electrical Engineering and Computer Science
Thesis Supervisor
Accepted by
Leslie A. Kolodziejski
Professor of Electrical Engineering and Computer Science
Chair, EECS Committee on Graduate Students
2
Inventory Estimation From Transactions via
Hidden Markov Models
by
Nirav Bhan
Submitted to the MIT Department of Electrical Engineering and Computer Science
on August 26, 2015, in partial fulfillment of the
requirements for the degree of
Master of Science in Electrical Engineering
Abstract
Our work solves the problem of inventory tracking in the retail industry using Hidden
Markov Models. It has been observed that inventory records are extremely inaccurate
in practice (cf. [1–4]). Reasons for this inaccuracy are item losses due to item theft,
mishandling, etc. which are unaccounted. Even more important are the lost sales due
to lack of items on the shelf, called stockout losses. In several industries, stockout is
responsible for billions of dollars of lost sales each year (cf. [4]). In [5], it is estimated
that 4% of annual sales are lost due to stockout, across a range of industries.
Traditional approaches toward solving the inventory problem have been geared
toward designing better inventory management practices, to reduce or account for
stock uncertainity. However, such strategies have had limited success in overcoming
the effects of inaccurate inventory (cf. [1]). Thus, inventory tracking remains an
important unsolved problem. The work done in this thesis is a step toward solving
this problem.
Our solution follows a novel approach of estimating inventory using accurately
available point-of-sales data. A similar approach has been seen in other recent work
such as [1, 6, 7]. Our key idea is that when the item is in stockout, no sales are
recorded. Thus, by looking at the sequence of sales as a time-series, we can guess
the period when stockout has occured. In our work, we find that under appropriate
assumptions, exact stock recovery is possible for all time.
To represent the evolution of inventory in a retail store, we use a Hidden Markov
Model (HMM), along the lines of [6]. In the latter work, the authors have shown that
an HMM-based framework, with Gibbs sampling for estimation, manages to recover
stock well in practice. However, their methods are computationally expensive and do
not possess any theoretical guarantees. In our work, we introduce a slightly different HMM to represent the inventory process, which we call the Sales-Refills model.
For this model, we are able to determine inventory level for all times, given enough
data. Moreover, our recovery algorithms are easy to implement and computationally
3
fast. We also derive sample complexity bounds which show that our methods are
statistically efficient.
Our work also solves a related problem viz. accurate demand forecasting in presence of unobservable lost sales (cf. [8–10]). The naive approach of computing a timeaveraged sales rate underestimates the demand, as stockout may cause interested
customers to leave without purchasing any items (cf. [8, 9]). By modelling the retail
process explicitly in terms of sales and refills, our model achieves a natural decoupling
of the true demand from other parameters. By explicitly determining instants where
stock is empty, we obtain a consistent estimate of the demand.
Our work also has consequences for HMM learning. In this thesis, we propose an
HMM model which is learnable using simple and highly efficient algorithms. This is
not a usual property of HMMs; indeed several problems on HMMs are known to be
hard (cf. [11–13]). The learnability of our HMM can be considered a consequence of
the following property: We have a few parameters which vary over a finite range, and
for each value of the parameters we can identify a signature property of the observation sequence. For the Sales-Refills model, the signature property is the location of
longer inter-sale intervals in the observation sequence. This simple idea may lead to
practically useful HMMs, as exemplified by our work.
Thesis Supervisor: Devavrat Shah
Title: Associate Professor
4
Acknowledgements
I would firstly like to express my utmost gratitude to my advisor, Devavrat Shah.
His guidance has been instrumental in my research. His enthusiasm, wisdom and
patience are remarkable, and I hope that some of these qualities rub off on me during
the course of our work together. In addition, his detailed guidance with writing has
allowed me to express my thoughts more clearly.
Secondly, I would like to express my thanks toward my friends. I wish to thank
Pritish, Anuran, Gauri, Sai, Ganesh and many others who have made my stay at
MIT pleasureable.
Thirdly, I would like to thank my past advisors, Prof. Vivek Borkar and Prof.
Volkan Cevher. The positive research experiences I had with them are responsible
for my pursuing an academic career.
Lastly, I am deeply thankful to my parents. Their support has been key to my
ability to focus on work undisturbed. Although leaving home for the first time is not
easy, for me or them, they have shown a great deal of support and understanding,
allowing me to do what I love.
5
6
Contents
1 Introduction
12
1.1
Background: Inventory Inaccuracy in Retail Industry . . . . . . . . .
12
1.2
Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
1.2.1
Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . .
15
1.2.2
Our contributions . . . . . . . . . . . . . . . . . . . . . . . . .
16
1.3
A spectral algorithm for learning HMM . . . . . . . . . . . . . . . . .
17
1.4
Formulation as a Constrained HMM Learning Problem . . . . . . . .
21
1.5
Organization of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . .
23
2 Model Description and Estimation
25
2.1
Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
2.2
The Sales-Refills Model . . . . . . . . . . . . . . . . . . . . . . . . .
26
2.2.1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
2.2.2
Formal definition . . . . . . . . . . . . . . . . . . . . . . . . .
27
Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
2.3
3 Algorithm
3.1
3.2
31
Estimating stock (or inventory) . . . . . . . . . . . . . . . . . . . . .
31
3.1.1
Estimating C, X0 mod C . . . . . . . . . . . . . . . . . . . . .
31
3.1.2
Estimating hidden stock . . . . . . . . . . . . . . . . . . . . .
32
Estimating q and p . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
7
3.3
Computational efficiency . . . . . . . . . . . . . . . . . . . . . . . . .
33
3.4
General C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
3.4.1
Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
3.4.2
General estimator for C, U . . . . . . . . . . . . . . . . . . . .
34
4 Proof of Estimation
36
4.1
Invariance modulo C . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
4.2
Correctness of Ĉ T and Û T . . . . . . . . . . . . . . . . . . . . . . . .
37
T
4.3
Correctness of X̂t
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
4.4
Correctness of q̂ T . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
4.5
Correctness of p̂T . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
4.6
Correctness of C̄T , ŪT . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
4.7
No-refill probability . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
5 Sample complexity bounds
52
5.1
Error bounds for stock estimation . . . . . . . . . . . . . . . . . . . .
52
5.2
Data sufficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
6 Generalized and Noise-Augmented models
6.1
6.2
6.3
6.4
62
Generalized Sales-Refills model . . . . . . . . . . . . . . . . . . . . .
62
6.1.1
Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
6.1.2
Description . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
Estimation of the Generalized Sales-Refills model . . . . . . . . . . .
66
6.2.1
Simplifying Assumptions . . . . . . . . . . . . . . . . . . . . .
66
Estimation with Uniform Aggregate Demand . . . . . . . . . . . . . .
67
6.3.1
Estimation of stock and U . . . . . . . . . . . . . . . . . . . .
68
6.3.2
Estimation of sales parameters . . . . . . . . . . . . . . . . . .
69
Augmenting the Sales-Refills model with Noise
. . . . . . . . . . . .
70
6.4.1
Description of Noisy Sales-Refills Model . . . . . . . . . . . .
72
6.4.2
Equivalence of hidden sales and Source uncertainty viewpoints
72
8
6.4.3
Estimation with Noise . . . . . . . . . . . . . . . . . . . . . .
7 Conclusion
73
76
7.1
Summary of contributions . . . . . . . . . . . . . . . . . . . . . . . .
76
7.2
Significance and Future Work . . . . . . . . . . . . . . . . . . . . . .
77
9
List of Figures
2.1
Shifted order-2-hmm, representing the hidden state variables Xt ’s and
observation variables Yt0 s. Notice that the observation sequence has
shift 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2
26
State transition diagram for the stock MC. Sales occur in all states
besides 0, while refills only occur in states ≤ R. . . . . . . . . . . . .
10
28
11
Chapter 1
Introduction
1.1
Background: Inventory Inaccuracy in Retail
Industry
Inventory inaccuracy is an important operational challenge in the modern retail industry, cf. [1–4,6,8]. Inspite of the availability of modern high-tech tools and methods,
it is found that many store managers do not know their inventory correctly. For example, the authors of [2] found that over 65% of all Stock Keeping Units (SKUs)
in a study of 370,000 SKUs did not match the physical inventory. Moreover, about
20-25% of the SKUs differed from the inventory by six or more items. Such inaccurate
records can have a serious negative impact on operational decisions in the store. For
example, stores rely on the inventory record to order a fresh batch of items, replenishing sold-out products on the shelf. If such orders are not made in a timely manner,
they may lead to a ‘stock-out’ situation, where sales are halted due to lack of items on
the shelf. Clearly, stock-out leads to significant losses for the store in terms of missed
sales. This problem is even more acute for stores that rely on automated inventory
management systems. In [4], it is estimated that the revenue lost due to stockout
equals 4% of annual sales, averaged across various industries.
12
There are several explanations for why the inventory records in stores are not
accurate. The most important reason is widely believed to be stock loss (cf. [4]).
Stock loss (also known as inventory shrinkage) refers to the loss of items in the store
due to employee theft, shoplifting, mishandling, damage etc. By its nature, stock loss
is unpredictable and hence very difficult to account for. It can also mean significant
economic losses for the store. In [4], it is noted that stores lose 1.5% to 2% of their
annual sales due to stock loss. They also note that for certain goods like batteries and
razor blades, which are prone to theft due to their small size and high value, the stock
loss can be as high as 5 to 8%. Unfortunately, stores are often ill-posed to combat
these effects due to poor inventory records. Moreover, although stock loss is an
important economic issue in its own right, its consequence viz. inventory inaccuracy
can have an even bigger impact, due to the stock-out phenomenon noted above. In [3],
the authors explain how inventory inaccuracy, while a serious issue for all industries,
can be especially problematic in industries where there is close collaboration between
various levels of the supply chain, and where demand uncertainity is lower and lead
times are shorter.
All of the above shows that inventory record inaccuracy is an important issue,
and one that traditional methods have failed to resolve. Recently, there have been
some attempts to use Machine Learning methods for solving these problems. These
methods are based on 2 key observations:
1. In all modern stores, sales are recorded in a highly detailed, accurate and usable
form.
2. When stockout happens, no sales will be recorded in the system, for a long
period of time.
Since the inventory must be regarded as not fully known, it is customary to replace
hard knowledge of the inventory with a probablistic representation of the same. For
example, in [1], the authors use a Bayesian inventory record, which is a vector of
13
probabilities about various inventory levels. The beliefs are updated from one time
period to the next based on 3 factors - the belief in the previous interval, the sales
recorded, and the time until the last sale. The authors also provide evidence via
simulations that their idea is useful in practice.
There is another important problem in retail that has attracted a lot of attention.
This is the problem of estimating aggregate demand for supplies. The obvious method
for estimating demand is to divide the number of sales in a large time period, by the
total time. This gives us the average number of sales per unit time, which is assumed
to equal demand. The problem with this approach is that it ignores the phenomenon
of stockout viz. no sales occur when shelfs are empty. Hence, the number of sales
per unit time is a strictly lower estimate than the actual demand, and it is difficult
to guess this gap. Statistical and machine learning techniques have been applied
to this task for quite some time. Works such as [8–10] address this problem using
quite elaborate models. In our work, we present a relatively simple HMM model
which addresses both of the above problems viz. inventory inaccuracy and demand
estimation.
1.2
Our Approach
A common feature of the two problems above is that they both rely on identifying
stockout periods. It can therefore be imagined that a single mechanism could solve
both problems. Our work in this thesis can be regarded as an important step in
this direction. Our work solves the problem of inventory inaccuracy in a novel way,
inspired from [6]. Following their approach, we give a Hidden Markov Model (HMM)
description of the retail system, which we call the Sales-Refills model. The hidden
state in our HMM is precisely the inventory level. Using accurately recorded sales
data, we show that this hidden state can be estimated consistently for all time. The
estimation of the hidden state is performed by identifying the stockout periods in a
14
clever way. This enables us to solve both the problems of inventory inaccuracy and
demand estimation in retail.
As an illustrative example, consider a candy shop using a point-of-sales (POS)
system like Square, Inc. The candy shop sells one type of candies. Whenever a
customer comes to the shop, if candies are not equal to 0 (i.e. stockout has not
happened), then the customer may buy a candy resulting into a transaction being
recorded in the POS. On the other hand, if candies available are 0 then customer shall
not be able to purchase any candy. Of course, this may be confused with the fact that
lack of customers arriving may not lead to recording of a transaction either. Thus,
observing transactions provides some information about the hidden state (number of
candies or inventory), but it is not clear if we can reconstruct the exact value of the
inventory using transactions.
To aid the process of estimation of the hidden state (inventory), we take note of
the following standard practice in retail industry called (the (s, S) policy (cf. [14,15]):
every time inventory level goes below a threshold s ≥ 0, a refill is ordered to get
inventory to level (at least) S > s. With the knowledge of this practice along with
the model that customers arrive as per Bernoulli process for a discrete time system
(cf. [8]) and make purchasing decisions independently, the resulting setup is that
of Hidden Markov Model where hidden state is the inventory that evolves as per a
Markov chain and observations, transactions, depend on the hidden state.
1.2.1
Literature Survey
The problem of estimating inventory and demand from observations has been considered in earlier work [1, 6, 7, 9, 10] Our work is inspired by that of [6] where authors
introduce a (second order) Hidden Markov Model for the question of inventory estimation from transaction data that we briefly alluded to in the text above. In [6], authors
show that using Gibbs Sampling to estimate the marginals in the usual method for
estimation of HMM, it is feasible to estimate inventory accurately in practical set15
ting. However, their result lacks theoretical guarantees. We also take note of [9] that
utilizes the Baum-Welch (or EM) algorithm for estimation of demand in a similar
context.
In general, this problem can be seen as estimation of hidden state of an HMM. It is
well known that in general estimation of hidden state is not possible from observations
(cf. [11]). However, under the model dependent specific conditions, it may be feasible
to recover the hidden state from observations. For example, over years there have
been various approaches developed to learn mixture distributions (simplest form of
HMM) [16–20]. In the recent and not so recent past, various sufficient conditions for
identifiability for HMMs have been identified, cf. [21–23]. None of these approaches
seem to provide meaningful answer to the HMM estimation problem considered here.
1.2.2
Our contributions
We provide a simple estimation procedure for which we establish that all the inventory
states can be estimated accurately with probability 1 after enough observations are
made. We derive the precise sample complexity bound to find how many samples are
needed to estimate inventory at all states with high enough probability and we find
that it to be efficient. Precisely, to get with probability at least 1-ε it takes O ε12
samples.
1
The key insight in our ability to learn the hidden state of the HMM is as follows.
In the HMM of interest, we establish that the primary source of uncertainty arises
due to the unknown value of the inventory in the beginning (initial state) along with
refill amount. That is, if the initial state and refill amount is known, then by using
transaction information, we can reconstruct the value of inventory for all subsequent
states. Here we utilize the fact that inventory refill is happening as per an (s, S)
policy (with unknown S). If maximum value of S is known (i.e. upper bound on
1
This is a loose bound, we actually get close to O
bounds should be possible for our model
16
1
ε
in chapter 5. We believe that stronger
S), then effective uncertainty in the system is captured by finitely many different
values (combination of initial state value and value of S). With each of these finite
option, we produce a verifiable condition that is asymptotically true if and only if
that particular finite option corresponds to the system uncertainty. This leads to
consistent estimator of the unknown inventory. The finite sample bound follows by
showing that this asymptotic property holds with high probability with few samples.
Thus we obtain efficient estimator.
In addition to solving the specific problem of mitigating inventory inaccuracies
using transaction data, our approach identifies a class of Hidden Markov Models that
are learnable. This class of models may be simple, but as exemplified by the above
setting, it could be of relevance in practice.
1.3
A spectral algorithm for learning HMM
We give a brief summary of the work in [21], and explain why it does not apply to our
current problem. In this work, the authors propose a method for learning comb-like
graphical models, which can be easily extended to learning HMMs. They provide a
PAC algorithm for learning a set of probabilities such that the output distribution
is close to the empirical output distribution (cf. Theorem 5 in [21]). We paraphrase
this theorem below for completeness.
Theorem A (From [21], Theorem 5 ). Let φd , κπ > 0 be constants. Let C be a
finite set and Mn denote the collection of |C| × |C| transition matrices P , where 1 ≥
|detP | ≥ n−φd . Then there exists a PAC-learning algorithm for (TC3 (n) ⊗ Mn , nκπ ).
The running time and sample complexity of the algorithm is poly(n, |C|, 1ε , 1δ ).
Note: The object (TC3 (n)⊗Mn , nκπ ) refers to the collection of all possible Caterpillar Trees on n leaves, endowed with transition matrices from Mn , for which the
stationary probability at every node for every state exceeds n−φd .
17
Strictly speaking, the above method only learns finite-size, HMM-like graphical
models 2 . To extend this method to HMMs, we use a result from [24]. The result
allows us to characterize the infinite output distribution of a HMM with a small block
of the output distribution. Precisely, their result is as follows:
Theorem B (From [24], Theorem 3). Let Θ(d,k) be the class of all HMMs with output
alphabet size d and hidden state size k. There exists a measure zero set E ⊂ Θ(d,k) such
that for all output processes generated by HMMs in the set Θ(d,k) \ E, the information
in P(N ) is sufficient for finding the minimal HMM realization, for N = 2n + 1, with
n > 2dlogd (k)e.
Combining the results from theorems A and B above, we obtain a method for
learning an HMM. This can be done as follows: instead of learning the entire (infinite)
output distribution of the HMM, we learn a finite block of our HMM of size l =
4dlogd ke + 1. We can obtain IID samples for this finite graphical model from the
infinite output stream of our HMM, by collecting blocks of ‘l’ consecutive samples,
leaving a gap of several observations after each block. Assuming the HMM is ergodic,
which is always true in practice, we shall obtain samples from the finite-length output
distribution that are nearly independent. These samples can be used to learn the
transition probabilities for our graphical model according to theorem A. By theorem
B, the same probabilities will correctly reproduce the infinite output distribution of
the HMM.
3
Limitations:
While the above result constitutes a useful step toward solving the hard problem of
HMM learning, it has several limitations. One consequence of these limitations is that
3
Note that the finite graphical model could in principle, have different transition and emission
probabilities for each of its edges. However, one expects the learnt probabilities to be equal in the
limit, since our observations are generated from a time-homogenous HMM.
18
the method is ineffective for our Sales-Refills model. We describe these limitations
below.
1. Homogeneity of Alphabet Size: The result of Theorem A requires all nodes
in the graphical model to have the same state space. In particular, this means
that we can only learn HMMs for which the number of hidden states equals the
alphabet size of the observed variable. Hence, thiis method cannot be used for
our Sales-Refills HMM, since the hidden state takes C + R + 1 different values,
while the observation variable is binary.
2. Spectral Conditions on Matrices and Stationary Probabilities: Theorem A also requires 2 important conditions to be satisfied, viz.
• All probability matrices (i.e. both transition and emission matrices) must
have determinant bounded polynomially away from 0.
• All nodes must take on each possible state with probability that is polynomially bounded away from 0. In other words, there should be no ‘rare’
states at any node.
These constraints restrict the class of models for which these methods can be
applied. However, there is a reason why such constraints are needed, viz. to
avoid hard instances such as the ‘Parity Learning with Noise’ HMM described
in [21].
3. Independence of Samples: Theorem A uses the concept of PAC-learning,
and hence it assumes the existence of an oracle which provides i.i.d. samples
from the output distribution of a block graphical model. In practice we are
likely to not have i.i.d. samples, but a single, infinitely long, continous data
stream from the output of an HMM. Our workaround is to take block samples
at large intervals, and assume these to be almost i.i.d. (assuming ergodicity).
While one expects this method to work in practice, the theorems will have to
19
be adapted to work with near-independence instead of independence, and the
error rates would have to be calculated.
4. Interpretability/Learning a Constrained System: We explore yet another issue, which is not often discussed in relation to HMM learning, although
it is quite important. In our Sales-Refills model, we are not trying to learn an
HMM from scratch, but rather trying to learn a model for which some information about the parameters is already available (see section 1.4 for a detailed
description). There does not seem to be any easy way to adapt the results of
theorem A for such a constrained HMM learning problem. If we ignore these
constraints, the model may not even be identifiable, as indeed the Sales-Refills
model is not. Although the PAC-learning framework does not require identifiability, and hence in principle it may be possible to apply the theorem A for
such an HMM, the resulting parameters are unlikely to have any meaning (since
they do not satisy the constraints) and hence may not be ‘interpretable’.
5. Measure-Zero Set Issues: The result of [24] viz. theorem 1.3 holds for all
HMMs except those in a measure zero set in the parameter space. Unfortunately,
whether measure zero sets are ‘negligible’ in practice is a rather philosophical
question. Since our linearly constrained HMM model has many parameters
forcibly set to 0, it is not easy to say our HMM is outside the measure zero set.
In case the result of theorem 1.3 is not applicable, there exist other results for
sufficiently large block size, but these may be much larger (polynomial rather
than logarithmic) and hence may not be very efficient.
20
1.4
Formulation as a Constrained HMM Learning
Problem
The HMM problem we are trying to solve can be described as learning the parameters
of the model given some partial knowledge about them. For our particular HMM,
these parameters are the initial state, the (hidden state) transition probabilities, and
the emission probabilites. The partial knowledge available to us in this case may
be regarded as linear constraints being imposed upon the probability matrices. In
our work, we also show that it is possible to learn a high-level parameter ‘C’ which
influences the output distribution in a more complex way. To present our problem
in a more general framework, we shall simplify the problem here by assuming C is
known. We also make one diversion from our Sales-Refills model in what follows: we
make sale and refills events exclusive rather than independent 4 .
The HMM learning problem we solve can be regarded as a special case of the
following general problem:
Definition 1 (HMM Learning with Linear Constraints). Given a partial HMM description viz. a hidden state size k, an output alphabet size d, constraint matrices Ae ,
At , Be , Bt ; where the transition and emission probability matrices Pt and Pe satisfy
the following:
Ae · Pe ≤ Be
At · Pt ≤ Bt
T
The problem is to find consistent estimators P̂T
t and P̂e based on T observations,
such that
T
P̂T
t , P̂e → Pt , Pe
4
In this case, the transition probabilities for low stock (i.e. less than R) become ternary variables,
divided into p, q, and 1 − (p + q). The reader can verify that this model is no more complex than
the model described in chapter 2, and can be solved by the same methods
21
as T → ∞. Furthermore, we can solve the problem ‘efficiently’ if our algorithm
produces estimates of the stochastic matrices such that every element of the matrix
is within distance ε of its true value, with probability exceeding 1 − δ, in time upper
bounded by a polynomial in
1
δ
and 1ε .
We will now show that our problem fits into this general framework. For this,
we shall briefly describe our model from a purely mathematical point of view. For a
more complete discussion including its motivation from the retail industry, the reader
is encouraged to read chapter 2.
Our HMM is of order C + R, meaning the hidden state takes values in the set
{0, 1, 2, . . . , C + R − 1} (remember that C is known). The observation alphabet is
of size 2, i.e. binary. We have the following information about the transition and
emission probabilities:
1. For any (hidden) state, the probability of moving to a lower state, that is lower
by more than 1, equals 0. That is,
P(Xt+1 < j − 1|Xt = j) = 0,
∀t, j
2. For any non-zero state, the probability of moving to a state which is lower by
exactly 1, is equal to a constant (i.e. independent of the state). That is,
P(Xt+1 = j − 1|Xt = j) = P(Xt+1 = i − 1|Xt = i),
∀t, j ≥ 1, i ≥ 1
3. For any state, the probability of moving to a bigger state equals 0, unless the
bigger state is exactly C larger, and the current state is less than or equal to
R. For transitions satisfying the 2 conditions, the transition probability is a
constant. That is,
P(Xt+1 > j|Xt = j) = 0 ∀t, j > R
P(Xt+1 = j|Xt = i) = 0 ∀t, i ≤ R, j > i, j 6= (i + C)
P(Xt+1 = j + C|Xt = j) = P(Xt+1 = i + C|Xt = i) ∀t, i ≤ R, j ≤ R
22
4. For any non-zero state, the probability of emitting a 1 is equal to a constant.
That is,
P(Yt = 1|Xt = j) = P(Yt = 1|Xt = i) ∀t, i ≥ 1, j ≥ 1
5. If the hidden state equals zero, the only symbol that can be emitted is a zero.
That is,
P(Yt = 0|Xt = 0) = 1 ∀t
All of the above facts can be encoded as linear constraints on the transition and
emission matrices.
Hence, the HMM problem we solve is a special case of HMM learning with linear
constraints. In fact, our work solves a somewhat more general problem than what
we describe above, since it allows us to find not just the transition and emission
probabilities, but also the starting state X0 and the high-level parameter C.
1.5
Organization of Thesis
Chapter 2 describes our HMM model for retail scenario, namely, the Sales-Refills
model. We also note down our key results for parameter estimation in this model, in
the form of 2 theorems.
Chapter 3 describes our estimation algorithm. We define estimators for each parameter mentioned in theorem 1.
Chapter 4 describes the proof of our estimation algorithm. In this chapter, we argue
for consistency of our model, by showing that all estimators defined in chapter 3
converge almost surely to the true quantities. Chapters 3 and 4 together constitute
our proof of theorem 1.
Chapter 5 describes our sample complexity bounds for the stock estimation. We show
that in addition to being consistent, our estimators are also efficient in the sense of
giving error bounds from polynomially many samples. This chapter gives the proof
of theorem 2.
23
Chapter 6 offers extensions of the Sales-Refills model, showing how our model can
be learnt even with some generalizations. In addition, it outlines our idea for Noisy
Sales-Refills model, which would overcome the key limitations of the basic model.
Chapter 7 summarizes our work, discusses its significance and gives pointers for future
work.
24
Chapter 2
Model Description and Estimation
2.1
Definitions
To describe our mathematical model for the retail system, we shall first define a
general order-k-hmm.
Definition 2 (Order-k-hmm). For any natural number k, we define an order-k-hmm
to consist of 2 sequences of random variables X0 , X1 , X2 , . . . and Y0 , Y1 , Y2 , . . . such
that Xi ’s form a markov chain, and Yi is a function of Xi , Xi−1 , . . . , Xi−k+1 for all
i ∈ N. We shall call X0 , X1 , X2 , . . . the sequence of hidden states, and Y0 , Y1 , Y2 , . . .
the sequence of observations. Thus, we allow each observation to depend on the past
k states.
Definition 3 (Shifted order-k-hmm). Given an order-k-hmm with state sequence
X0 , X1 , X2 , . . . and observation sequence Y0 , Y1 , Y2 , . . . , we say that the observation
sequence has shift d if Yi is a function of Xd+i , Xd+i−1 , . . . , Xd+i−k+1 , for all i.
Our HMM model shall be an order-2-hmm, with the observation sequence having
shift 1. As a directed graphical model, this can be represented as in figure 2.1.
25
X0
X1
X2
X3
Y0
Y1
Y2
Figure 2.1: Shifted order-2-hmm, representing the hidden state variables Xt ’s and
observation variables Yt0 s. Notice that the observation sequence has shift 1.
2.2
2.2.1
The Sales-Refills Model
Overview
Our model describes how the inventory or stock for an item in a retail store evolves
with time. The random variable Xt denotes the stock at time t 1 . We assume that
the stock evolves according to a markov process i.e. X0 , X1 , X2 , . . . , XT form a markov
chain (MC). Since the stock is not observable, this is a hidden MC. We also have transactions or sales, which we denote by the random variable Yt , at time t. Sales are observed in our model. Taken together, the sequences X0 , X1 , . . . , XT and Y0 , Y1 , . . . , YT
form a shifted order-2-hmm with shift 1 (figure 2.1).
Our model incorporates two ways in which stock can change - sales and refills.
Whenever the stock is non-zero, there is a probability q that a customer buys an
item. For simplicity, we shall assume that a customer can only purchase 1 item at
a time 2 . Since sales are observed, we have Yt = 1 if a sale occurs at time t. Sales
decrease the current stock by 1. We also allow refill events. If the current stock is
smaller than a number R, the store manager may decide to order a fresh batch of
stock. We assume that the stock refill happens with probability p, and this increases
1
2
We assume there is a notion of time. For more details, we refer the reader to [6]
Some justification for this assumption is provided in [6], by suitably defining non-purchase events
26
the stock by a fixed amount C. Clearly, R < C. Our refill policy is thus a variant of
the popular (s, S) policy for inventory management ( [15]). In our model, refills are
not observed. We also allow sales and refills to occur independently of each other.3
2.2.2
Formal definition
We will now formally describe our HMM as a generative model. We assume that
Xt ∈ {0, 1, 2, . . . , S}, where S = C + R is the maximum stock value. The initial stock
value viz. X0 is a parameter in our model. Knowing Xt , we show how to generate Xt+1
as well as Yt . For this, we define 2 variables 4s,t and 4r,t representing the change in
stock due to sales and refills respectively, at time t.
(a) 4s,t : Sales decrement.
i.
4s,t =
When Xt > 0 :


1 w.p. q

0 w.p. 1 − q
4s,t = 0
ii. When Xt = 0 :
(b) 4r,t : Refill increment.
i.
When Xt ≤ R :
4r,t =


C
w.p. p

0
w.p. 1 − p
4r,t = 0
ii. When Xt > R :
We now define Xt+1 and Yt as follows:
3
Yt
,
4s,t
Xt+1
,
Xt + 4r,t − 4s,t
In this paper, we use terms stock for inventory and sales for transaction during the formal
treatment. We use R and C respectively for the s and S in the modified (s, S) policy.
27
The above tells us how, starting from a particular X0 , we can generate a sequence
of state variables X1 , X2 , . . . , XT and a sequence of observations Y0 , Y1 , . . . , YT . This
completes our description of the model. For clarity, the possible transitions of the
stock MC are shown diagrammatically in figure 2.2.2.
C+R
C+1
Sale
(q)
Refill
(p)
C
R+1
Sale
(q)
R
Refill
(p)
1
Sale
(q)
0
Figure 2.2: State transition diagram for the stock MC. Sales occur in all states besides
0, while refills only occur in states ≤ R.
28
2.3
Estimation
For the Sales-Refills model of section 2.2.2, assuming that we know R, we claim that
it is possible to obtain consistent estimates of all parameters of interest. Further,
this will allow us to estimate the hidden stock accurately at all times. We formalize
this claim in two theorems. For the sample complexity bounds, we shall make an
additional assumption about C, namely C ∈ {K, K + 1, . . . , 2K − 1} for some known
integer K. We call this assumption ‘restricted C’. We shall describe how it can be
removed in section 3.4.
Theorem 1 (Consistent Estimation). Let R be known. Then, there exist estimators
q̂ T , Ĉ T , X̂0T , p̂T and X̂tT , based on observation Y0T = (Y0 , Y1 , . . . , YT ), so that as T →
∞,
a.s.
(q̂ T , Ĉ T , X̂0T , p̂T , X̂tT ) → (q, C, X0 mod C, p, Xt mod C).
(2.1)
That is, our estimators converges almost surely.4
The result states that there exist consistent model estimators. Next, we discuss
the finite sample error bound.
Theorem 2 (Sample Complexity). Let R be known, and C satisfy the above stated
restriction that C ∈ {K, . . . , 2K − 1} for known value of K. Then, for any ε ∈ (0, 1),
!
sup
P
|(X̂t − Xt ) mod C| > 0
≤ ε,
t∈{0,...,T }
for all T ≥ max
4B 2 4
,
ε2 ε
. The constant B is defined explicitly in terms of model
parameters, via equations 5.20 , 5.13, 5.14, and 4.35.
Theorem 1 constitutes the main result of this paper, as it shows our model is
learnable, while theorem 2 shows that the estimation procedure is efficient. As a
4
a.s.
Note that aT −−→ a as T → ∞, means that the random variable aT converges to a with
probability 1, as T → ∞
29
consequence, whenever the stock lies strictly between R and C, our algorithm yields
the exact stock. On the other hand, if the stock is too low i.e. smaller than R, there is
no way to discern whether a refill has occurred as per ours, or indeed any algorithm.
It should be noted that in practice one expects R C and hence this uncertainty is
of minimal relevance.
We prove the above theorems by giving explicit algorithms for parameter recovery,
in chapter 3. The proof of correctness of these algorithms is given in chapter 4, and
provides the proof of theorem 1. The proof of finite-sample error bounds (i.e. theorem
2) is described in chapter 5.
30
Chapter 3
Algorithm
We describe our methods for stock and parameter recovery.
3.1
Estimating stock (or inventory)
In order to estimate the stock correctly, we must first estimate parameters C and
X0 mod C. Then, we use these values to recover the hidden stock values.
3.1.1
Estimating C, X0 mod C
Although X0 is a parameter in our model, we shall only be interested in the value of
X0 mod C, which we denote by U for convenience. That is,
U , X0 mod C
(3.1)
Define the set Sc,i , as being the set of time indices t, such that number of observed
sales from 0 up to t is congruent to i modulo c. These sets are defined for each
c ∈ {K, . . . , 2K − 1}, and for a fixed c, for each i ∈ {0, 1, . . . , c − 1}.
Thus, we have the following sets, indexed by parameters c and i:
(
)
t−1
X
STc,i , t ∈ {1, . . . , T } :
Yj ≡ i mod c
∀ c, i
j=0
31
(3.2)
Define the sale events for c, i up to time T , as the number of time instants t in
STc,i , such that Yt > 0. In other words,
T
Ec,i
,
X
1(Yt =1)
∀ c, i
(3.3)
t∈ST
c,i
Define the average window length for c, i up to time T :
LTc,i
|STc,i |
, T
Ec,i
(3.4)
Define the maximizing indices:
c∗T , i∗T , arg max LTc,i
(3.5)
c,i
Then, Ĉ T = c∗T and Û T = i∗T are our estimators for C and U respectively.
3.1.2
Estimating hidden stock
Using our estimators for C and U , we can now easily recover the stock modulo C for
T
all times. We define an estimator X̂t for Xt mod C, as follows:
T
X̂t =
Û T −
t−1
X
!
Yj
mod Ĉ T
(3.6)
j=0
3.2
Estimating q and p
Estimating q
Having already described the method for estimating C and U in the section 3.1.1, we
now assume that these are known. In order to estimate p and q from the data, we
define the following empirically-obtained quantities:
P
T
i∈{0,...,C−1},i6=U |SC,i |
TST , P
(average sale time)
T
i∈{0,...,C−1},i6=U EC,i
T
TSZ
,
|STC,U |
= LTC,U
T
EC,U
(average sale time at ‘zero’ stock)
32
(3.7)
(3.8)
Our consistent estimator for q is q̂, where:
q̂ T ,
1
TST
(3.9)
Estimating p
We now look into estimating the refill probability, p. Let q̂ be our estimate of q from
the previous section. Define function f : [0, 1] × [0, 1] → R+ as
R
q − qp
1
.
f (p; q) =
p q − qp + p
(3.10)
As argued in Lemma 5, for fixed q, f is a strictly decreasing function in it’s first
argument p. That is, it has well defined inverse f −1 in its first argument. Using this
property, we define
T
p̂T = f −1 ((TSZ
− TST )+ , q̂ T ).
3.3
(3.11)
Computational efficiency
Our estimation algorithms, due to their simplicity, are computationally very efficient.
The most expensive step in our algorithm is computing |STc,i | and |Ec,i |T for each c
and i. By doing computations cleverly, we can do all computations for one value of c
in a single pass over the observations. Hence, the total time required for computing
all values equals O((no. of values of c to try) ∗ T ) = O(K T ). Since K ≤ C < 2K,
we can also write the time complexity as O(C T ).
3.4
3.4.1
General C
Idea
We extend the estimator described in section 3.1.1 to general C, not necessarily
between K and 2K. In the general case, we assume that C ∈ {Cmin , . . . , Cmax }, where
33
Cmin and Cmax are 2 positive integers, and Cmax ≥ Cmin ≥ 2.
1
Our observations in
lemma 3 still hold. Namely, if we guess an incorrect value C 0 , the fraction of times we
shall see a longer interval equals
gcd(C,C 0 )
.
C
However, it is now possible that we choose
a C 0 which may be a multiple of C, in which case the quantity above equals 1. Thus,
we are likely to be confused between the true C and its multiples, since these will
have the same expected window length.
The way to resolve this is to notice that for large T , all multiples of C will give
‘approximately equal’ window lengths, and this length will be the maximum among
all (c, i) indices. Thus, we can look at all (c, i) pairs for which LTc,i lies within a suitable
distance δ of maxc,i LTc,i . If we have sufficient data, all values of c which attain such a
maximum should be multiples of each other, of the form c∗T , 2c∗T , 3c∗T , . . . etc. We then
pick the smallest of these multiples viz. c∗T as our estimator for C. For the above
value of c, there shall be exactly one value of i, viz. i∗T , such that LTc∗T ,i∗T is close to
maxc,i LTc,i . We pick i∗T as our estimator for U .
3.4.2
General estimator for C, U
Assume that Cmax and Cmin are known, and the true C lies between them. Our aim
is to determine the true C, along with the corresponding X0 mod C (i.e. U ).
As in section 3.1.1, define sets STc,i , for each c ∈ {Cmin , . . . , Cmax } (note the range),
and for a fixed c, for each i ∈ {0, 1, . . . , c − 1}. Similarly, define the sale events and
average window length up to time T , for each choice of the parameters c and i. Thus,
we have the following quantities:
(
STc,i ,
t ∈ {1, . . . , T } :
t−1
X
)
Yj ≡ i
mod c
∀ c, i
(3.12)
j=0
T
Ec,i
,
X
1(Yt =1)
∀ c, i
(3.13)
t∈ST
c,i
1
It may be of interest from a theoretical standpoint to consider the case when Cmax is allowed
to depend on T . However, we do not consider this scenario in the current work.
34
LTc,i ,
|STc,i |
T
Ec,i
∀ c, i
(3.14)
Define the maximizing quantity:
L∗T , max LTc,i
c,i
(3.15)
Also define the following min-max quantity:
L̃T , max min LTc,i
c
i
Define the following set of candidate indices for C, viz.
3 ∗
1
T
T
C = c ∈ {Cmin , . . . , Cmax } : max Lc,i ≥ LT + L̃T
i
4
4
(3.16)
(3.17)
Now, we define the following indices:
C̄ T , min(CT )
(3.18)
Ū T , arg max LTC̄ T ,i
(3.19)
i
Then, C̄ T and Ū T are our general estimators for C and U respectively.
35
Chapter 4
Proof of Estimation
We now prove that the methods described in chapter 3 give correct answers. We start
by proving an important lemma.
4.1
Invariance modulo C
We shall prove the following lemma, which states that given the true C and U , it is
possible for us to determine the stock modulo C at all times. This lemma captures
the observability of our model.
Lemma 1. In a Sales-Refills model, let X0 represent the starting state, Xt represent
the hidden state at time t, C represent the refill quantity, and let U , X0 mod C.
Then the following congruence relation holds modulo C:
Xt ≡ U −
X
Yi
mod C
∀t ∈ {1, 2, . . . , T }
(4.1)
0≤i≤t−1
Proof. Intuitively, if we look at the value of the hidden state modulo C, it can change
in only 2 ways: by either a sale or a refill. Since sales are observed, we can simply
subtract them out to get the updated value of X . Although refills are unobserved,
they do not affect the value of X mod C. Hence, it is possible to determine the
36
value of Xt mod C from the observations alone. To prove this formally, we use the
definitions of Xt and Yt .
Xt+1 = Xt + 4r,t − 4s,t
Yt = 4s,t
(by definition)
(by definition)
Since 4r,t ∈ {0, C},
4r,t ≡ 0
∀t
mod C
∴
Xt+1 ≡ Xt − 4s,t
∴
Xt+1 ≡ Xt − Yt
mod C
mod C
Adding up these equations for 0 ≤ t ≤ t0 − 1, we get:
Xt0 ≡ X0 −
X
Yt
mod C
0≤t≤t0 −1
∴ Xt ≡ U −
X
Yi
mod C
0≤i≤t−1
4.2
Correctness of Ĉ T and Û T
We first prove few lemmas regarding the quantities defined in section 3.1.1. To that
end, define
PN R ≡ PN R (p, q) =
q − qp
q + p − qp
R
.
(4.2)
Lemma 2. Let Lc,i be defined as in equations (3.2)-(3.4). Then,
1 PN R
+
,
q
p
1
E[LC,i ] =
,
∀i ∈ {0, 1, . . . , C − 1} \ {U }.
q
E[LC,U ] =
Proof. From the definition 3.2 and lemma 1, we can re-write STC,i as
STC,i = {t ∈ {1, 2, . . . , T } : Xt ≡ U − i mod C}
37
(4.3)
(4.4)
Thus, STC,U represents the set of times when stock equals 0 mod C, and hence contains
all instants when the stock is empty (or equal to C). Now, Li is the average window
length in STC,i , and represents the average length of time we need to wait in order to
observe a sale (immediately after the previous sale). Hence if i 6= U , then the stock is
always non-empty, so LTC,i is the average time for sale from a non-empty stock. Since
the sale time is a geometric random variable with success probability q, we get:
1
E[LTC,i ] = ,
q
∀i 6= U.
The distribution for i = U is more complex. In our refill policy, we allow for the
possibility that the stock has been refilled before reaching 0. We call this an ‘early
refill’. Thus, if we observe the stock value to be 0 modulo C, the true stock could be
either 0 or C. We need to treat these cases separately.
We define the No-Refill Probabillity PN R in section 4.7 to be the long-term probability of the event that when the stock attains a value congruent to 0 modulo C
immediately after a sale, the true stock is actually 0. When this is true, we need
to wait for a refill before we can observe the next sale. Thus, with probability PN R ,
the window length is a sum of 2 geometric random variables, one of which has mean
1
p
and the other has mean 1q . With probability 1 − PN R , the stock is refilled before
reaching 0, so the window length is a geometric random variable with mean 1q . Hence,
we can write:
Average window length in STC,U = (average sale time) + PN R ∗ (average refill time | no early refill)
1
1
∴ E[LC,U ] =
+ PN R
q
p
Lemma 3. For a Sales-Refills mode of section 2.2, with the definitions described in
lemma 2, the following holds:
E[Lc,i ] ≤
1 PN R
+
q
2p
38
∀c 6= C, ∀i
(4.5)
Proof. In this case, we do not assume that our c equals the true C, so it is more
difficult to interpret the various sets STc,i . To that end, think of the sequence of intersale gaps as representing a sequence of intervals. The interval lengths are random,
with two possible means - either d1 =
1
q
or d2 =
1
q
+ PN R p1 . Clearly d2 > d1 . The
intervals with mean d2 occur at a fixed period of once every C intervals, and all
other intervals have mean d1 . The leftmost interval with mean d2 occurs at position
U ∈ {0, 1, . . . , C − 1}. Our problem can be stated as figuring out the correct period
C and starting position U for these longer intervals.
We attempt to solve this by guessing a value of C (which we call c) and a value
of U (which we call i). We then compute the average interval length, of intervals
starting at i and at a gap of c intervals respectively, up to a total of T intervals. We
call this average length LTc,i . Clearly, if we guess the correct values of c and i, viz.
C and U respectively, then we shall obtain interval length whose average value is d2 ,
the largest possible. On the other hand, if we guess an incorrect value of c, say C 0 ,
then the question is what is the longest average interval length that will be observed.
We would like to show that it is strictly less than d2 so as to allow for our ability to
distinguish such an incorrect choice of C 0 from the true value of C.
Indeed, we shall establish this fact. Let C 0 6= C. Then we shall observe the longer
intervals at most a fraction
gcd(C 0 ,C)
C
of the time for any value of i. We establish this
fact next.
There are two possibilities for a particular choice of C 0 6= C and i. One possibility
is that we never observe longer intervals e.g. if all longer intervals are even numbered,
and we are only looking at odd-numbered intervals. In this case, the fraction of longer
intervals is 0. Now, suppose that our choice of C 0 and i is such that we observe at least
1 longer interval. Then the next time we observe a longer interval is exactly when C 0
and C have a common multiple. This will happen once in every lcm(C, C 0 ) intervals.
Meanwhile, our average length LTC 0 ,i is computed by considering one interval in every
C 0 intervals. Thus, over the long term, the fraction of longer intervals in our average
39
equals:
1/lcm(C, C 0 )
gcd(C, C 0 )
C0
=
=
1/C 0
lcm(C, C 0 )
C
For any C 0 which is not a multiple of C, this quantity is at most 21 . Now, in any set
of the form {K, K + 1, . . . , 2K − 1}, where K is a positive integer, there are no two
numbers such that one is a multiple of the other. Thus, at most half the intervals in
our average shall have mean d2 . Hence, we can write:
E[LTc,i ] = d1 · (fraction of intervals with mean d1 ) + d2 · (fraction of intervals with mean d2 )
= d1 + (d2 − d1 ) · (fraction of intervals with mean d2 )
1
≤ d1 + d2
(since d2 − d1 > 0)
2
1 1 PN R
≤
+
∀c 6= C, ∀i
q 2 p
since by assumption, LTc,i is only defined for c ∈ {K, K + 1, . . . , 2K − 1}. This proves
the required result.
Combining lemmas 2 and 3, we are ready to prove the correctness of our estimators.
Proposition 1. Under the setup of Theorem 1, as T → ∞
a.s.
(Ĉ T , Û T ) → (C, U ).
Proof. From lemmas 2 and 3, we get:



= 1q +



T
E[Lc,i ]





≤ 1 +
q
PN R
,
p
(4.6)
c = C, i = U
(4.7)
PN R
,
2p
o.w.
Moreover, since LTc,i is an empirical average of integrable i.i.d. random variables,
we have by Strong Law of Large Numbers that
40
a.s.
LTc,i −−→ E[LTc,i ] ∀c, i,
a.s.
∴
arg max LTc,i −−→ arg max E[LTc,i ],
∴
arg max LTc,i
c,i
c,i
∴
4.3
c,i
a.s.
−−→ C, U
(from equation 4.7),
a.s.
Ĉ T , Û T −−→ C, U.
Correctness of X̂t
(4.8)
T
Proposition 2. Under setup of Theorem 1, as T → ∞
T a.s.
X̂t → Xt
mod C.
(4.9)
a.s.
Proof. This follows directly from lemma 1 and the fact that (Ĉ T , Û T ) → (C, U ) as
per Proposition 1.
4.4
Correctness of q̂ T
Proposition 3. Under setup of Theorem 1,
a.s.
q̂ T → q,
as T → ∞.
(4.10)
Proof. The quantity TST above represents the average length of time for a sale to occur,
when the initial stock does not equal 0 modulo C. Since the stock must be non-empty
in such instances, the length of time for a sale is a geometric random variable with
success probability q. Hence, E[TST ] = 1q . Moreover, since TST is the empirical average
of integrable i.i.d. random variables, by Strong Law of Large Numbers, we have
a.s.
TST −−→ E[TST ]
41
a.s. 1
−−→ .
q
(4.11)
And hence
q̂ T ,
4.5
1
TST
a.s.
−−→ q.
(4.12)
Correctness of p̂T
First, we state few lemmas.
Lemma 4. Under the setup of Theorem 1,
T
E[TSZ
− TST ] = f (p; q),
where f is defined in (3.10).
Proof. From lemma 2 and proposition 3, we know that:
1
1
+ PN R ,
q
p
1
E[TST ] =
.
q
T
E[TSZ
] = E[LTC,U ] =
Hence,
T
E[TSZ
− TST ] =
1
PN R .
p
(4.13)
From equations (3.10) , (4.13) and (4.35),
T
E[TSZ
− TST ] = f (p; q).
Lemma 5. Given fixed R ≥ 0, let f : [0, 1] × [0, 1] → R+ be defined as
R
1
q − qp
f (p; q) =
.
p q − qp + p
(4.14)
Then for any q ∈ (0, 1), f is a strictly monotonically decreasing function of variable
p ∈ (0, 1).
42
Proof. Effectively, f is product two terms, each of which is strictly decreasing in p
(for fixed q ∈ (0, 1) and R ≥ 0). Clearly, the first term, 1/p is strictly decreasing
function in p ∈ (0, 1). The second term is
q − qp
q − qp + p
R
=
1
1+
p
q(1−p)
!R
.
If R = 0, then it is constant. If R > 0, then the second term is strictly decreasing
function of p ∈ (0, 1) since p/(1 − p) is strictly increasing in p. Therefore, f is strictly
decreasing function of p, for fixed q, R.
Proposition 4. Under setup of Theorem 1, as T → ∞
a.s.
p̂T → p.
(4.15)
T
Proof. First, note that both TST and TSZ
are empirical averages of i.i.d. random
variables with finite mean. By Strong Law of Large Numbers,
1
,
q
1
1
a.s.
−−→
+ PN R ,
q
p
1
a.s.
−−→ PN R = f (p; q).
p
a.s.
TST −−→
T
TSZ
Hence,
T
TSZ
− TST
Given q, the function f is strictly decreasing in p. Therefore, with respect to argument
p, there exist an inverse of f which we defined as f −1 ≡ f −1 (·, q). Because we do not
have access to the true q, as stated in Section 3.2, we used our estimate q̂ T to make
f −1 computable from observations. That is, we used f −1 (·; q̂ T ) to obtain p̂T as
T
p̂T = f −1 ((TSZ
− TST )+ , q̂ T ).
(4.16)
T
− TST )+ .
f (p̂T , q̂ T ) = (TSZ
(4.17)
That is,
43
As argued before, we have
a.s
q̂ T → q,
(4.18)
a.s
T
(TSZ
− TST )+ → f (p, q).
That is,
a.s.
f (p̂T , q̂ T ) → f (p, q).
(4.19)
Now f is a continuous function and it is strictly decreasing in its first argument.
Therefore, by Lemma 6 and (4.18) and (4.19), it follows that it must be p̂T → p. This
completes the proof.
We state a useful analytic fact.
Lemma 6. Consider function f : [0, 1] × [0, 1] → R+ . Let f continuous. Further, for
any given q ∈ (0, 1), let f (·, q) be strictly decreasing function (in its first argument).
Let there be (pn , qn , xn ), n ≥ 1 so that f (pn , qn ) = xn for all n; as n → ∞, qn → q
and xn → f (p, q) for some (p, q) ∈ (0, 1) × (0, 1). Then pn → p as n → ∞.
Proof. Suppose the contrary that pn does not converge to p. Without loss of generality, let p∗ = lim inf pn < p. That is, there exists sub-sequence nk → ∞ as k → ∞
so that pnk → p∗ < p. That is, for all k large enough, pnk < p and we shall consider
only such values of nk . Using the fact that f (·, q) is strictly decreasing, we have that
f (pnk , q) > f (p, q) + δ,
(4.20)
for some δ > 0. Since qn → q and f is continuous function, we have that for all n
large enough,
|f (pn , qn ) − f (pn , q)| < δ/4.
(4.21)
Now considering nk for k large enough so that both (4.20) and (4.21) are satisfied,
we obtain
f (pnk , qnk ) > f (pnk , q) − δ/4,
> f (p, q) + 3δ/4.
44
But by assumption of the Lemma, we have that f (pnk , qnk ) → f (p, q). Therefore, our
assumption is wrong and hence pn → p as desired.
4.6
Correctness of C̄T , ŪT
The correctness of the general estimators follows directly from some simple facts. We
shall prove these in the form of lemmas.
Lemma 7. With respect to the quantities defined in section 3.4.2, the following hold:
1
1
+ PN R
q
p
1
1
1
a.s.
−−→
+ PN R
q C
p
a.s.
−−→ {c ∈ {Cmin , . . . , Cmax } : c = kC, for some k ∈ N} 1
a.s.
L∗T −−→
(4.22)
L̃T
(4.23)
CT
(4.24)
Proof. We shall prove the above convergence relations one at a time. Consider the
first relation:
a.s.
L∗T −−→
1
1
+ PN R
q
p
where L∗T , maxc,i LTc,i . From lemma 3, we obtain the following:




= 1q + PNp R , c = kC, i mod C = U



E[LTc,i ]





≤ 1 + PN R , o.w.
q
2p
(4.25)
Moreover, since LTc,i is an empirical average of i.i.d. integrable random variables,
a.s.
LTc,i −−→ E[LTc,i ] ∀c, i as T → ∞. Hence,
a.s.
max LTc,i −−→ max E[LTc,i ] =
c,i
1
c,i
1
1
+ PN R
q
p
By almost sure convergence of a sequence of sets, we mean the following: For every possible
element, the sequence of indicator functions corresponding to its membership in the sets converges
almost surely. For finite sets, this means the sequence of sets eventually becomes constant.
45
Since the LHS equals L∗T , this proves equation 4.22 .
Now, consider the second relation:
a.s.
L̃T −−→
1
1
1
+ PN R
q C
p
To prove this, we rely on a slightly deeper understanding of lemma 3. As described in the lemma, consider the sequence of inter-sale intervals as a sequence of
intervals, with longer intervals occuring regularly at some period. U and C represent
respectively the starting point and period of the longer intervals. As proved in the
lemma, if we guess a C 0 6= C, the fraction of times we shall see the longer intervals is
given by
gcd(C,C 0 )
,
C
provided at least 1 longer interval is observed.
However, since we are minimizing over the starting point, we now ask the question
- for which C 0 is it possible to observe a sequence with no longer intervals? It turns out
that for all C 0 which have a common factor with C (i.e. gcd(C, C 0 ) > 1), there exists
a starting location such that no longer intervals are observed. This can be proved as
follows: suppose gcd(C, C 0 ) = x, and suppose U, U 0 are the true and assumed starting
states, such that U 6≡ U 0 mod x. Since both C and C 0 are congruent to 0 modulo x,
the true sequence with C, U and the assumed sequence with C 0 , U 0 will never overlap,
since they will be different modulo x. Conversely, if C and C 0 are co-prime, then for
every starting point, we shall observe some longer intervals, and their frequency is
exactly
1
.
C
Now, consider the definition of L̃T , viz. L̃T , maxc mini LTc,i . Since we are
minimizing over i, for any c which has a common factor with C, asymptotically a
starting state will be selected for which there are no longer intervals. Hence, the
average window length of this sequence will tend to the average sale time. However,
since we are also maximizing over choice c, this means that the c chosen in our minmax definition will asymptotically always be co-prime to C. For such a co-prime C 0 ,
we shall observe the longer intervals only a fraction
1
C
of the time, as mentioned above
(by lemma 3). Thus, we obtain the required equation:
46
a.s.
L̃T −−→
1
1
1
+ PN R
q C
p
This completes the proof of the second result.
We now consider the third equation, viz.:
a.s.
CT −−→ {c : c = kC, for some k ∈ N}
Recall the definition of C, viz.:
T
c ∈ {Cmin , . . . , Cmax } :
C =
max LTc,i
i
3
1
≥ L∗T + L̃T
4
4
Using equations 4.22 and 4.23, we see that asymptotically, the set CT is close to
C̄T , where the latter is defined as follows:
3 1 PN R
1 1 PN R
T
T
+
+
C̄ ,
c ∈ {Cmin , . . . , Cmax } : max Lc,i ≥
+
i
4 q
p
4 q p·C
1
1 PN R 3
T
T
+
i.e. C̄ =
c ∈ {Cmin , . . . , Cmax } : max Lc,i ≥ +
(4.26)
i
q
p
4 4C
Note: C̄T defined above is not an empirically obtained set, but because of equations 4.22 and 4.23, the empirically defined set CT behaves like it asymptotically.
We define two sets, C̄TL and C̄TU to bound the above set. These sets are defined as
follows:
∴
C̄TU
∴
C̄TL
c ∈ {Cmin , . . . , Cmax } :
,
,
max LTc,i
i
c ∈ {Cmin , . . . , Cmax } : max LTc,i
i
1 3 PN R
≥ +
q 4 p
1 7 PN R
≥ +
q 8 p
(4.27)
(4.28)
Assuming C ≥ 2, it is easy to see that C̄TL ⊆ C̄T ⊆ C̄TU . Now, we consider the
behaviour of the 2 bounding sets as T goes to ∞. From lemma 3, we know that for
every c that is not a multiple of C, maxi E LTc,i ≤ 1q + 12 PNp R . On the other hand,
47
for every c that is a multiple of C, maxi E LTc,i =
1
q
+
PN R
.
p
And since the Lc,i ’s are
empirical averages, they converge to their expectations as T → ∞. Hence:
1 PN R
+
, for all c = kC for some k ∈ N
q
p
1 gcd(c, C) PN R
1 1 PN R
a.s.
−−→
+
≤
+
, otherwise
q
C
p
q 2 p
a.s.
max Lc,i −−→
(4.29)
max Lc,i
(4.30)
i
i
Using the above equations, we can determine what the sets C̄TL and C̄TU look like,
asymptotically. In particular, we see that these sets converge to the following:
a.s.
C̄TL −−→ {c ∈ {Cmin , . . . , Cmax } : c = kC, for some k ∈ N}
a.s.
C̄TU −−→ {c ∈ {Cmin , . . . , Cmax } : c = kC, for some k ∈ N}
(4.31)
(4.32)
Since C̄TL ⊆ C̄T ⊆ C̄TU , and C̄TL , C̄TU both converge a.s. to the same set, this implies
that C̄T must also converge a.s. to the same limit set. Hence:
a.s.
C̄T −−→ {c ∈ {Cmin , . . . , Cmax } : c = kC, for some k ∈ N}
From equations 4.22 and 4.23, and from the definition of C̄T in equation 4.26, it
follows that CT has the same convergence properties as C̄T . In particular :
a.s.
CT −−→ {c ∈ {Cmin , . . . , Cmax } : c = kC, for some k ∈ N}
Thus we have proven the third statement of the lemma viz. equation 4.24. This
completes our proof of the lemma.
From equation 4.24 in lemma 7, it directly follows that:
a.s.
min(CT ) −−→ C
i.e.
a.s.
C̄T −−→ C
48
(4.33)
This proves the consistency of our estimator for C. We now prove the consistency
of our estimator for U . This proof is relatively straightforward. Since C is a discrete
variable, the a.s. convergence of C̄T to C means that with probability one, we shall
have the correct C after finitely many steps. Once the correct C is obtained, we
recover U as the index which maximizes the window length for this C.
It is easy to show that if C were known apriori, the estimator for U would be
correct. This can be seen as follows:
∀i 6= U : E[LTC,U ]
∀i :
∴
E[LTC,i ]
>
a.s.
LTC,i −−→ E[LTC,i ]
a.s.
arg max LTC,i −−→ arg max E[LTC,i ]
i
i
∴
a.s.
ŪT −−→ U
Although we do not know C apriori in this case, we have shown above that we
will know it eventually, with probability 1. Once C is known, all future observations
can be regarded as independent events which will bias LC,U toward a larger value
than LC,i , for all other i. Since the variables Lc,i ’s are empirical averages, finitely
many observations will not affect their long term behaviour. Thus, eventually we
shall recover the correct U as arg maxi LTC,i . Hence, our estimator for U is consistent
as well. That is,
a.s.
ŪT −−→ U,
where U¯T
arg max
,
i∈{0,1,...,C̄T −1}
4.7
(4.34)
LC̄T ,i
No-refill probability
Definition 4. We define the No-Refill Probability PN R , as the long-term probability
that the stock equals 0 given that it equals 0 modulo C, in a time instant immediately
after a sale.
49
In order to estimate refill probability p, we shall need to explicitly compute PN R .
Below, we provide an expression for this probability. To do this, we will first prove a
lemma about doimnance probabilities in Geometric random variables.
Lemma 8. Consider 2 independent Geometric Random Variables X and Y , generated
from Bernoulli processes with success probabilities a and b respectively, both defined
on the support {1, 2, . . . ∞}. Then the probability that we see a success of the first
process before the second i.e. P(X < Y ) =
a−ab
.
a+b−ab
Proof. By definition:
P(X = i) = a(1 − a)i−1
∀ i ∈ {1, 2, . . . }
P(Y = i) = b(1 − b)i−1
∀ i ∈ {1, 2, . . . }
∴ P(Y > i) =
+∞
X
b(1 − b)j−1 = b(1 − b)i
j=i+1
∴ P(Y > X) =
=
+∞
X
i=1
+∞
X
1
= (1 − b)i
b
P(X = i)P(Y > i)
a(1 − a)i−1 (1 − b)i
i=1
a(1 − b)
1 − (1 − a)(1 − b)
a − ab
=
a + b − ab
=
We now want to find the probability that we arrive at a stock of 0 without any
refill occuring. Since refills only occur when the stock is ≤ R, we can be certain that
when the stock equals R + 1 mod C, it’s actual value is R + 1. Immediately after
the next sale, the value of the stock equals R. Thus, in order to see a stock of 0 the
next time when it equals 0 mod C, we need to see R consecutive sale transactions
occuring before any refill. By the Markov property of our model, the probabillity
50
that the r + 1-th sale transaction occurs before a refill transaction is independent of
r. Thus, the probability of having 0 stock is simply the R-th power of the probability
that the next sale occurs before a refill.
Thus, PN R = P(Sale before Refill)R . From lemma 8, this implies:
PN R =
q − qp
q + p − qp
51
R
(4.35)
Chapter 5
Sample complexity bounds
In this section,we analyze the amount of data (i.e. observations) required to learn
our parameters effectively. We thereby establish the efficiency of our estimators.
5.1
Error bounds for stock estimation
We wish to find the probability that our estimators in section 3.1.1 return the wrong
values, for a finite amount of data. That is, we want to find:
P (Ĉ T , Û T ) 6= (C, U )
T
= P (C, U ) 6= arg max Lc,i
c,i


[
= P
LTc,i ≥ LTC,U 
(c,i)6=(C,U )
≤ 2K 2
max
(c,i)6=(C,U )
P LTc,i ≥ LTC,U
(union bound)
(5.1)
We now bound the above probability for any (c, i) 6= (C, U ) 1 . To do this, we
first prove a lemma which expresses the distribution of these random variables in a
convenient form.
1
By (c, i) 6= (C, U ), we mean that at least one of the following holds: c 6= C, or i 6= U .
52
d
d
d
Lemma 9. Define the random variables: A ∼ Geometric(q), W ∼ Geometric(p), Z ∼
Bernoulli(PN R ).
2
Let A1 , A2 , . . . be i.i.d. random variables distributed identically as
A; and likewise for W and Z . Let all random variables defined above be independent
of each other.
With respect to the setup in theorem 1 and section 3.1.1, the following 2 properties
hold:
d
LTC,U ∼
I.
N1
1 X
(Ai + Zi Wi )
N1 i=1
(5.2)
T
where N1 = EC,U
LTc,i
II.
d 1
∼
N2
where
N2
X
!
Ai
i=1
+
N3
X
!
Zi W i
(5.3)
i=1
T
N2 = Ec,i
and
1 T
N3 ≤ Ec,i
2
for any fixed values of (c, i) other than (C, U )
Proof. The proof follows largely from arguments made in section 4.2.
We first prove the statement for LTC,U . By definition, LTC,U =
ST
C,U
T
EC,U
. As noted in
lemma 2, the set STC,U consists of those time instances from 0 up to T , for which the
stock Xt satisfies: Xt ≡ 0 mod C. This set consists of many different ‘windows’ of
observations, where each window represents a set of contiguous time instances, ending
with an instance where sale occurs. No other time instance in a window has a sale,
T
except the last. Hence, the number of such windows is exactly EC,U
. Moreover, each
window represents either a sale time (if there is early refill), or the sum of a refill and
sale time (otherwise). Accordingly, we define the Bernoulli random variable Z , which
equals 1 if there is no early refill, which happens with probability PN R . It may be
noted that Z is independent of any other sale/refill times in STC,U .
By definition, the RV A has the same distribution as a sale time, and the RV W
2
d
The notation X ∼ Y means that r.v. X has the same distribution as r.v. Y
53
has the same distribution as a refill time. Thus, a single window length in STC,U is
distributed identically as A with probability 1 − PN R , and distributed as A + W the
rest of the time. Hence, a single window length is distributed as A+ZW , and therefore
T
PEC,U
the average window length viz. LTC,U is distributed as E T1
i=1 (Ai + Zi Wi ).
C,U
We now turn to proving the second statement, which concerns the distribution of
T
windows holds
LTc,i , when (c, i) 6= (C, U ). Note that the same subdivision into Ec,i
for the set of indices in STc,i as described above. But this time, we cannot exactly
determine the stock modulo C in this set. However, following the argument of lemma
3, we can say that the fraction of times we see a ‘longer interval‘ (i.e. window in which
Xt ≡ 0 mod C) is at most 12 . If we let N2 be the total number of windows, and let
N3 be the number of windows where stock is congruent to 0 mod C, then N3 ≤ 21 N2 .
Also, in these N3 windows, the window length is a sale time with probability (1−PN R ),
while the rest of the time it is the sum of a sale and refill time. In the remaining
T
N2 − N3 windows, the window length is simply a sale time. Also, N2 = Ec,i
as noted
earlier. Thus, the distribution of LTc,i is identical to:
1
N2
N2
X
i=1
Ai +
N3
X
i=1
!
Zi Wi
,
1
T
where N2 = Ec,i
, N3 ≤ N2
2
We now obtain a bound on the probabiliity that LTc,i ≥ LTC,U for any (c, i) 6= (C, U ).
Lemma 10. With respect to the setup in theorem 1 and section 3.1.1, fix any arbitrary
T
T
(c, i) 6= (C, U ). Also, let N1 = EC,U
and N2 = Ec,i
, and assume that N1 , N2 ≥ N for
some positive integer N . Then, the following holds:
1 32p2 2
2
P LTc,i ≥ LTC,U ≤
σA + σW
Z
2
N PN R
1−q
(2 − p)PN R − PN2 R
2
where σA2 =
and
σ
=
WZ
q2
p2
54
(5.4)
Proof. First, notice that:
P
LTc,i
≥
LTC,U
1 3 PN R
1 3 PN R
T
T
≤ P
LC,U ≤ +
∪ Lc,i ≥ +
q 4 p
q 4 p
1 3 PN R
1 3 PN R
T
T
+ P Lc,i ≥ +
(5.5)
≤ P LC,U ≤ +
q 4 p
q 4 p
Now using Chebyshev’s inequality, we have for any random variable X :
P (|X − E[X]| ≥ ∆)) ≤
1 2
σ
∆2 X
∀∆ > 0
(5.6)
2
, E [X 2 ] − E[X]2 represents the variance of X .
where σX
Now, let us define X = LTC,U . Note that E[X] = 1q + PNp R . Let the variance of LTC,U
2
be denoted as σC,U
;T . Then:
1 3PN R
1 3 PN R
T
P LC,U ≤ +
=P X≤ +
q 4 p
q
4p
PN R
= P X − E[X] ≤ −
4p
PN R
≤ P |X − E[X]| ≥
4p
(4p)2 2
σ
(using 5.6)
≤
(PN R )2 X
16p2 2
= 2 σC,U
;T
PN R
Thus,
P
LTC,U
1 3 PN R
≤ +
q 4 p
≤
16p2 2
σ
PN2 R C,U ;T
(5.7)
Similarly, we can obtain a bound on the second term of equation 5.5. Define
X = LTc,i , for any c, i 6= C, U . Note that E[X] ≤
55
1
q
+
PN R
.
2p
Let the variance of LTc,i be
2
denoted as σc,i;T
. Then, in a similar manner to above, we have:
1 3 PN R
1 3PN R
T
P Lc,i ≥ +
=P X≥ +
q 4 p
q
4p
PN R
= P X − E[X] ≥
4p
PN R
≤ P |X − E[X]| ≥
4p
2
(4p)
≤
σ 2 (using 5.6)
(PN R )2 X
16p2 2
= 2 σc,i;T
PN R
Thus,
P
LTc,i
1 3 PN R
≥ +
q 4 p
≤
16p2 2
σ
PN2 R c,i;T
(5.8)
Combining equations 5.5, 5.7 and 5.8, we get:
16p2 2
2
P LTc,i ≥ LTC,U ≤ 2 σC,U
;T + σc,i;T
PN R
(5.9)
We further note that it is easy to bound the variance of LTC,U and LTc,i , as they are
empirical averages of i.i.d. random variables. From lemma 9,
d
LTC,U ∼
∴
2
σC,U
N1
1 X
(Ai + Wi Zi )
N1 i=1
N1
1 X
2
= 2
(σA2 + σW
Z)
N1 i=1
2
∴ σC,U
=
1 2
2
(σ + σW
Z)
N1 A
2
where σA2 and σW
Z are the variances of random variables A and (WZ ) respectively,
defined in lemma 9. Now, assuming N1 ≥ N , we get:
2
σC,U
≤
1 2
2
(σA + σW
Z)
N
56
(5.10)
We can easily derive the same for c, i:
" N
!
!#
N3
2
X
X
1
d
LTc,i ∼
Ai +
Wi Zi
N2
i=1
i=1
!
N3
N2
X
X
1
2
= 2
∴ σc,i
σ2
σ2 +
N2 i=1 A i=1 W Z
1
2
(N2 · σA2 + N3 · σW
Z)
N22
1
2
≤
(since N3 ≤ N2 , from lemma 9)
σA2 + σW
Z
N2
2
∴ σc,i
=
2
∴ σc,i
Since N2 ≥ N , we have:
2
σc,i
≤
1 2
2
(σ + σW
Z)
N A
(5.11)
We use equations 5.10 and 5.11 along with 5.9 to obtain:
1 32p2 2
2
σ
+
σ
P LTc,i ≥ LTC,U ≤
WZ
N PN2 R A
(5.12)
We shall express the variances on RHS in terms of model parameters. Using the
2
definitions of random variables A, W , and Z from lemma 9, we compute σA and σW
Z
:
σA2 , E A2 − (E[A])2 ,
d
where A ∼ Geometric(q)
1
2−q
−
q2
q2
1−q
∴ σA2 =
(5.13)
q2
d
d
2
2
σW
− (E[WZ ])2 , where W ∼ Geometric(p), Z ∼ Bernoulli(PN R )
Z , E (WZ )
= E W 2 E Z 2 − (E[W ])2 (E[Z ])2
=
1
2−p
(PN R ) − 2 (PN2 R )
2
p
p
(2 − p)PN R − PN2 R
=
p2
=
2
σW
Z
∴
(5.14)
3
3
2
2
For brevity, we continue to refer to the variances as σA
and σW
Z , but their explicit form can be
found in equations 5.13 and 5.14
57
Equations 5.12, 5.13, and 5.14 together complete our proof.
Now, we are ready to prove our result:
Proposition 5. Under the setup of theorem 1, given T observations, it is possible to
recover the stock modulo C viz. Xt mod C for all time uniformly, with high probability.
Precisely, we prove the statement in theorem 2 viz.
!
|(X̂t − Xt ) mod C| > 0
sup
P
≤ ε,
t∈{0,...,T }
for all T ≥ max
2B 2
ε
,
4
ε
, where B is a constant that depends on model parameters.
Proof. From equation 5.4, we know that
1 32p2 2
2
P LTc,i ≥ LTC,U ≤
σA + σW
Z
2
N PN R
(5.15)
Now we use the result of lemma 11, to obtain N in terms of T . Specifically, note
that:
TN ≤ 4N k
K PN R
+
q
p
T
i.e. N ≥
4k
K
q
+
PN R
p
,
k
1
, with probability at least 1 −
2
k
1
with probability at least 1 −
(5.16)
2
Combining equations 5.15 and 5.16, we get:
PN R
K
k
4k
+
q
p
1
32p2 2
T
T
2
P Lc,i ≥ LC,U ≤
σA + σW Z +
(5.17)
2
T
PN R
2
k
In the above expression, the term 12 accounts for the probability of the event that
N does not satisfy the inequality in equation 5.16. Now, using equation 5.1, we get:
k
4k Kq + PNp R 32p2
1
T
T
2
2
2
P (Ĉ , Û ) 6= (C, U ) ≤ (2K )
σA + σW Z +
(5.18)
2
T
PN R
2
Or, more simply,
Bk 1 k
P (Ĉ , Û ) 6= (C, U ) ≤
+
T
2
T
T
58
(5.19)
where
(256K 2 p2 )
B,
PN2 R
Suppose we choose k such that
1 k
∈ T1 , T2 . Then, we get:
2
T
C PN R
2
(5.20)
+
σA2 + σW
Z
q
p
1 k
≈ T1 i.e. choose k = blog2 T c, so that
2
T
P (Ĉ , Û ) 6= (C, U ) ≤
B log2 T + 2
T
(5.21)
In order to make the error probability smaller than ε, we can make each term in
the above smaller than 2ε . Thus, we can choose T such that
2
T
≤ 2ε , and
B log2 T
T
≤ 2ε .
Thus, we get:
T
T
∀T
T
T
∀T
T
T
∀T
T
T
∀T
P (Ĉ , Û ) 6= (C, U ) ≤ ε,
=⇒ P (Ĉ , Û ) 6= (C, U ) ≤ ε,
=⇒ P (Ĉ , Û ) 6= (C, U ) ≤ ε,
=⇒ P (Ĉ , Û ) 6= (C, U ) ≤ ε,
B log2 T + 2
:
≤ε
T
ε
2
ε
B log2 T
≤ ,
≤
:
T
2
T
2
!
√
B T
ε
4
:
≤ , T ≥
T
2
ε
2 !
2B
4
: T ≥ max
,
ε
ε
(5.22)
4
The above gives us a bound on the error probability ε in estimating C, U in terms
of T . From lemma 1, this is exactly the probability that we recover stock incorrectly.
Hence,
!
sup
P
|(X̂t − Xt ) mod C| > 0
≤ ε,
t∈{0,...,T }
for all T ≥ max
4
For T > 20,
√
2B 2
ε
, 4ε
T always dominates log2 T . We expect T > 20 in any practical setting
59
5.2
Data sufficiency
In this section, we determine the number of time instants we shall have to wait,
in order to see N instances of each random variable defined in lemma 9, with high
probability. The data bounds derived in this section will be useful for deriving our
sample complexity bounds. We prove these bounds in the following lemma:
Lemma 11. With respect to the setup in theorem 1 and section 3.1.1, let N1 (T ) =
T
T
for all c, i 6= C, U . For any positive integer N , let TN be
EC,U
, and let N2c,i (T ) = Ec,i
the smallest time such that N1 (T ) ≥ N , and N2c,i (T ) ≥ N . That is, define a stopping
time:
TN , min{T N1 (T ) ≥ N, ∀c, i : N2c,i (T ) ≥ N }
Then the following holds for any positive integer k:
k
K PN R
1
TN ≤ 4N k
+
, with probability at least 1 −
q
p
2
Proof. We wish to find the smallest time such that for each c, i, the set Sc,i contains
at least N sales. For a particular value of c, i, this will happen if the total number of
sales up to time T exceeds N c. Thus, in order to satisfy this for all c, i, we need to
observe N (2K) sales in total, since by assumption C ≤ 2K.
In order to sell 2KN items, we shall have to refill the stock at most 2N times.
This is because each refill adds at least K items, since C ≥ K. Thus, the total
time required would be at most equal to the sum of 2K(N + 1) sale times, and the
additional waiting time required for 2N refills. Note that we will actually have to
wait for a refill only with probability PN R , since the rest of the time the stock refill
would have already occured before the stock becomes empty. Thus with probability
PN R , we shall have to wait for a refill which takes average time p1 . Also, note that
each sale takes time
1
q
on average. Using these observations, we can get an upper
bound on the expected value of TN as follows:
E[TN ] ≤
2KN
2N PN R
+
q
p
60
By the Markov inequality, we obtain: P(TN ≥ 2E[TN ]) ≤ 12 . That is, if we wait
for time 2E[TN ], we have at least a 50% chance of seeing the required 2KN sales.
Since the above arguments are independent of the starting state, they can be easily
extended. By considering k different intervals of size 2E [TN ] each, we can say that
k
the probabillity of observing 2KN sales in total, is at least 1 − 21 . Thus,
TN
∴ TN
k
1
≤ 2kE[TN ], with probability at least 1 −
2
k
1
K PN R
+
, with probability at least 1 −
≤ 4kN
q
p
2
61
(5.23)
Chapter 6
Generalized and Noise-Augmented
models
In this chapter, we describe a more general version of the Sales-Refills model introduced in chapter 2. This model, which we call a Generalized Sales-Refills model, can
represent a larger class of retail systems, or can represent existing markets more accurately, leading to better estimation. We also look at a modification called the Noisy
Sales-Refills model, which augments our basic model with noise-handling capabilities.
We present both models in this chapter in brief, without formal proofs and details.
6.1
6.1.1
Generalized Sales-Refills model
Modifications
Our general model is also a second order Hidden Markov Model with shift 1, representing the evolution of stock. It still has 2 events that can change the stock viz. Sale
and Refill, of which the former is observed. However, there are 3 key changes in the
general model. These are described below.
62
1. Multiple Sales:
We allow multiple items to be sold in a single instant. Thus, the observation
variables Yt ’s are no longer binary variables, but instead are integers describing
the number of units sold at time t. This is an important feature for 2 reasons.
Firstly, if we are using a notion of real time to generate purchase and nonpurchase events, then we are forced to allow for the possibility of multiple sales.
For example, suppose we divide the day into 24 hours, and let each observation
variable represent the total sales made in the corresponding hour. Multiple sales
naturally arise from this definition.
Even if we define non-purchase events using the method described in [6], it
may still be useful to allow for multiple sales. When modelling items that
are frequently sold in bunches, allowing for multiple sales can lead to a much
more accurate/realistic model than breaking them up into consecutive sales.
The latter may not give very good results when modelling sales as independent
geometric random variables.
2. Non-uniform Sales Probability: We also allow the sales probabilities to be
different at different times. In particular, we allow the possibility that current
inventory levels may influence the purchasing patterns of customers. There are
2 important effects that such a stock-dependent model can handle.
Firstly, since we allow for multiple sales, there is an obvious need to allow some
dependence of purchase probabilities on stock, since a customer cannot buy
more items than the current inventory level. Secondly, we allow the tendency
of purchase to be influenced by the stock level. For example, if the stock is very
low, a prospective customer may not see the item on the shelf, and hence fail
to purchase it. Similarly, a customer who is window-shopping would not see
the item, and hence we would miss a possible sale of the item which may have
occurred if the stock was high. Apart from this, there can be other unconscious
63
influences on the customer behaviour due to the stock level.
3. Unknown R: We may treat R as an unknown parameter, rather than a known
quantity.
We have assumed so far that the refill level R is known. While this assumption
makes perfect sense for a retailer who must know his own store, it may not be
so obvious if the estimation is being done by a point-of-sales system like Square.
In such a situation, we may wish to treat R as an unknown parameter. We
may even wish to estimate it from observations. Unfortunately, the latter is not
possible (without additional knowledge), since both R and p influence the refill
times in equivalent ways. However, as we shall demonstrate, it is possible to
estimate certain other quantities without knowledge of R. Thus, this is not a
fruitless assumption.
6.1.2
Description
Below, we describe the modifications in our formal definition of the Sales-Refills
model, in order to allow the generalizations described above. These modifications
shall correspond to redefining the Sale Event, while the description of the Refill Event
shall be the same as earlier.
1. Sale :Whenever the stock is greater than zero, we expect customers to arrive and
sales to occur. Suppose at time instant t, the current stock Xt is equal to some
integer k > 0, then there is a probability qk of a sale occuring. Note that the
sale probability depends on the current value of the stock. When a sale event
occurs, several items may be sold. We allow up to m items to be sold in a single
sales transaction. Given that a sale event occurs, the number of items sold is
given by the following probability distribution: with probability rk,j , we will
64
sell j items, for each j ∈ {1, 2, . . . , min(m, k)}. Such a sale obviously reduces
the stock by j.
Accordingly, we define the “sales decrement” 4s,t for the general Sales-Refills
model as follows:
(1) When Xt > 0:
Let Xt = k, for some positive integer k. Then,
(i) With probability 1 − qk , no sale happens, meaning that 4s,t = 0.
(ii) With probability qk , a sale occurs. In this case, 4s,t ∈ {1, . . . , m},
with the probability distribution given by P(|4s,t | = i) = rk,i ∀i ∈
{1, 2, . . . , m}.
(2) When Xt = 0:
No sale occurs from an empty stock i.e. 4s,t = 0.
Note that all the parameters defined above such as qk ’s and rk,j ’s are unknown
parameters in our model. We will later talk about learning these parameters
from observations.
2. Refill :We allow refills to occur exactly as described in section 2.2.1, for the ordinary
Sales-Refills model. We also define the refill incerement ∆r,t exactly as in section
2.2.2.
As before, we define Xt+1 and Yt by adding the above quantities with appropriate
sign:
Yt , 4s,t
Xt+1 , Xt + 4r,t − 4s,t
As an illustrative example, suppose we have a sale of 3 items as well as a refill
occuring at a time instant t, then we obtain the stock at time t+1 as: Xt+1 = Xt +C−3.
65
6.2
Estimation of the Generalized Sales-Refills model
In this section, we talk about estimating parameters from data. It turns out that the
Generalized Sales-Refills model is too complex for recovering all parameters merely
from observations of sales. We deal with this by imposing certain simplifying assumptions, which reduce the parameter space. While many such sets of assumptions are
possible in theory, we will discuss one in detail below.
6.2.1
Simplifying Assumptions
We describe here a set of assumptions which are quite reasonable for a large number
of situations. We call these assumptions the Uniform Aggregate Demand condition.
This involves the following 3 constraints:
(i) ∀ k : qk = q
(ii) ∀ k ≥ m : For all j, rk,j = wj
(iii) ∀ k < m : For all j < k, rk,j = wj . For j = k, rk,k =
Pm
i=k
wi .
The above assumptions can be understood intuitively. The underlying idea is
this: we assume that customers arriving at the store form a homogenous distribution,
which does not depend on the stock. Moreover, we assume that each customer arrives
with a fixed intent of buying a certain number of items. However, the actual number
of items purchased by the customer depends not just on her itent, but also on the
current inventory at the store. If the inventory level of the store is lower than the
number of items the customer wants to buy, we assume that she simply buys the
available number of items. In other words, the customer buys out the particular
item, and the item goes out of stock.
It can be seen how this leads to the assumptions made above. For example,
consider the uniformity of qk ’s. Since every customer who arrives at the store leaves
only after making a purchase, the probability of sale is not affected by the current
66
inventory level, as long as it is non-empty. Similarly, we can understand the 2nd
assumption. It says that when there is enough stock, the customer preferences are
uniform. The 3rd assumption is more interesting. This assumption says, for example,
that when the stock equals either 3 units or 10 units, the probability of a customer
purchasing 2 units is the same. This is to be expected, because in both cases, the
only customer who purchases 2 units is the person who came with the intent of
buying exactly 2 items. However, when the stock equals 3 units, customers who come
with the intention of buying 4 units, 5 units, or any higher amount, also leave after
purchasing 3 units. Thus the demand for the current number of items in stock is an
aggregate of the customers who come with the intent of buying the same number or
any higher number, of items.
The above assumptions reduce the set of independent parameters on the sales side
to m + 1 parameters: q, w1 , w2 , . . . , wm . Although some such reduction is necessary
for learning the parameters, the constraints can be different and more general than
what we have imposed here.
One may also contrast the above scheme with another possibility, where the customer buys only the number of items she originally intended to buy, and leaves the
store if sufficient stock is not available. Such a scheme will lead to a different set of
constraints on the parameters, which can be worked out. One interesting feature is
that we shall see a dependence of the sales probability on stock level, in this case. In
general, then, the right set of assumptions would depend on the application.
6.3
Estimation with Uniform Aggregate Demand
Assuming the properties described in section 6.2.1, we now proceed to estimation.
We can obtain the following estimation results for the General Sales-Refills model
under Uniform Aggregate Demand.
Theorem 3 (Consistent Estimation of Generalized model). For the model described
67
in section 6.1 with assumptions in section 6.2.1, assume that C is known while R
may be unknown. Then, there exist estimators q̂ T , Ĉ T , X̂0T , p̂T , X̂tT , and ŵiT based on
observation Y0T = (Y0 , Y1 , . . . , YT ), so that as T → ∞,
a.s.
(q̂ T , X̂0T , X̂tT , ŵiT ) → (q, X0 mod C, Xt mod C, ŵi ).
6.3.1
(6.1)
Estimation of stock and U
Our estimation algorithm is a direct extension of the algorithm for the basic model,
T
described in chapter 3. That is, we define the index sets STC,i , the sales events EC,i
,
and the average window length LTC,i almost exactly as defined in section 3.1.1. Notice
that C is known, so that our sets are indexed by a single parameter viz. i. We also
T
, to take into account all instants
need a slight modification in the definition of EC,i
when a sale occurs, rather than instants when exactly 1 item is sold. Thus, we have
the following definitions:
(
STi ,
t ∈ {1, . . . , T } :
t−1
X
)
Yj ≡ i
mod C
∀i
(6.2)
j=0
ẼiT ,
X
1(Yt ≥1)
∀i
(6.3)
t∈ST
i
LTi ,
|STi |
EiT
(6.4)
As before, we argue that the window length would be much longer for those periods
which correspond to a stock of 0 mod C, because in this cases we need to wait for a
refill (assuming no early refill) as well as a sale. Therefore, a simple estimator for U
works:
Û T = arg max LTi
(6.5)
i
Once we have estimated U , the stock at any time can be estimated as before.
68
X̂tT ,
Û T −
t−1
X
!
Yt
mod C
i=0
We claim that these estimators converge almost surely. That is,

a.s.
Û T −−→ U 
as T → ∞

T a.s.
X̂t −−→ Xt
(6.6)
The proof is identical to the proof in the basic case.
6.3.2
Estimation of sales parameters
We will now discuss the estimation for the 2 other quantities mentioned in theorem
3, viz. q and the wi ’s.
Estimating q:
Our estimator for q has exactly the same form as before. Thus, we define the following
quantity:
i∈{0,...,C−1},i6=U
|STi |
i∈{0,...,C−1},i6=U
EiT
P
TST
, P
(average sale time)
And our estimator for q is derived from the above, as:
q̂ T =
1
TST
Estimating wj ’s:
We wiill now outline the procedure for estimating wj ’s, for j ∈ {1, . . . , m}. Our basic
idea is to use those instants where the stock exceeds j, so that the fraction of sold
items directly give us wj .
69
Consider again the set Si viz. the set of time instants t where Xt − X0 = i mod C,
for all i ∈ {0, 1, . . . , C − 1}. Define a family of numbers Ẽi,j to represent the number
of instants in Si such that Yt = j. That is,
T
,
Ẽi,j
X
1(Yt =j)
∀ i, j
(6.7)
t∈ST
i
Now, note that STi represents those instants when the stock is congruent to U − i
modulo C. Hence, ST((U −k) mod C represents time instants when the stock equals k. It
follows that Ẽ(U −k mod C),j is the number of instants when exactly j items are sold,
from a stock of k items. Since U is not known apriori, we can replace U by Û T in these
equations. Thus, our consistent estimators for the parameters wj ’s are as follows:
ŵj T
P
k: j+1≤k≤C−1 Ẽ(U −k mod C),j
= P
k: j+1≤i≤C−1 E(U −k mod C)
∀ j ∈ {1, 2, . . . , m}
(6.8)
A careful look at the above expression shows that it is simply the empirical estimator for a multinomial distribution. However, we only use those time instances for
estimation when the stock available is greater than or equal to (k+1). This follows
from how the sales probabilities are defined. Note that in the above expression, we
have implicitly used the assumption that m < C. If this assumption does not hold,
we will have to use a more complex estimation scheme.
It is easily seen via Strong Law of Large Numbers that our estimators for q and
wj ’s converge almost surely.
6.4
Augmenting the Sales-Refills model with Noise
They key drawback of our previous models with respect to practical applications, is
the assumption that the retail market is a perfect process. In other words, no items
are sold, lost, or otherwise changed without being recorded as a transaction in the
system, except in a refill. Moreover, the stock also changes in a very precise manner
70
during refills i.e. by exactly C items, for some fixed constant C. Both of these assumptions may be violated in practice. To demonstrate that these assumptions are
rather farfetched, we may consider what would happen if both of these assumptions
were true. Knowing the stock at one time instant then allows us to essentially determine its value for ever! In settings where such assumptions held in practice, inventory
inaccuracy would not be a problem in the first place. Of course, in the real world
the problem of inventory inaccuracy is ubiquituous, which means that these assumptions are frequently untrue in practice. Thus, the model is inadequate for solving the
problem of inventory estimation.
To make our model useful in practice, we have to do away with these assumptions.
In particular, we must allow for goods to be lost in other ways besides sales, such as
theft, misplacement, poor handling, etc. From a practical viewpoint, it is virtually
impossible to record or track such losses, hence they are unobserved. One way to
model these losses is to allow a certain number of ‘hidden sales’ in each refill-cycle.
These hidden sales reduce the stock just like ordinary sales, but they are not recorded
in the transactions and hence not observable. They can occur anywhere and are
independent of the usual sales and refills. In addition to losses of goods, we might
also wish to account for losses during refilling the stock. We call this flexibility as
‘source uncertainity’. As we show later (section 6.4.2), the two relaxations above are
equivalent with respect to observable quantities.
It may be noted that above we have only considered the possibility that the true
stock is smaller than the recorded stock, and not the other way. While this is usually
the case in practice, there are also a few effects that lead to the true stock being
larger. For example, suppose the cashier accidentally marks a purchased item of type
A as type B, so that the real stock of B is higher than the recorded stock. If we
wish to handle these influences in our model, we can allow for ‘hidden negative sales’
which increase the stock by 1. The analysis of this case carries forth exactly like the
analysis we describe here. For ease of description, here we shall only consider events
71
which reduce stock.
6.4.1
Description of Noisy Sales-Refills Model
We describe one possible means for extending the basic Sales-Refills model to hande
noise. We start with the same basic idea of a Markov Chain to model the evolution
of stock, with ‘sale’ and ‘refill’ processes allowed to change the stock. In addition to
these, we shall now allow a new type of event to change the stock, which we call a
‘hidden sale’. A hidden sale sale would reduce the stock by a single unit, just like
an ordinary sale. However, unlike the latter, hidden sales are not observed i.e. there
is no information about them in the observed (i.e. Y ) variables. Another difference
from sales is that multiple hidden sales can occur in a single time instant, and their
occurence is uaffected by ordinary sale or refill events.
To make our model estimable, we make further assumptions. We assume that in
each refill-cycle, there can be at most L hidden sales. This means, of the C goods
ordered in any refill, not more than L goods will be lost due to wear and tear. To be
useful, we require that L should be a small number i.e.
L
C
<< 1. We can see that
this is a reasonable assumption in practice. Moreover, even if this assumption is not
strictly satisfied during all refill cycles, our model remains fairly accurate, provided
that the average rate of losses is small, and the per-cycle deviations from this rate
are not too large.
We call the above model a Noisy Sales-Refills Model.
6.4.2
Equivalence of hidden sales and Source uncertainty viewpoints
The motivation for our Noisy model stems from losses of goods in real markets.
These losses could happen either while restocking or during regular sales. It turns
out that as far as observables quantities are concerned, these 2 types of losses are
72
indistinguishable. Hence, for the purpose of estimation, we can assume that all losses
of goods occur during a refill at the beginning of each cycle.
The reason for this equivalence is as follows. Suppose in a refill cycle, we have 100
items added to an empty stock (i.e. C = 100), of which 5 are lost during restocking
(i.e. source loss = 5), and another 5 items are lost during sales. Then the item
will be in stockout after selling 90 items, and hence we shall observe a (potentially)
longer gap between the 90th and 91st sale, as we may have to wait for a refill before
observing the next sale. Once a refill occurs, the stock shall again be at a 100 items.
Note that even if 8 items were lost at source and 2 during purchases, or vice-versa, we
would observe exactly the same kinds of inter-sale gaps. Similarly, for the items lost
during purchases, the exact timing of these losses does not affect, and hence cannot
be estimated from, the observations. Thus, for purposes of estimation, it is valid
to assume that all losses occur during any single time instant in a cycle. We shall
consider that time instant to be the beginning of the cycle.
There is another advantage of this viewpoint. Suppose we wish to estimate a
modification of our Sales-Refills model, where the refill policy is a pure s, S policy.
In other words, the amount of items added during a refill is not a constant C, but
can vary between S − s and S, depending on the inventory level at the time of refill.
Since refill times are not observed in our Sales-Refills model, there is no way to know
the exact number of items added in a refill. However, we can model the variability in
refill amount as a pseudo-loss of goods. In terms of the notation described in section
6.4.1, we simply replace L by L + s. Thus, the Noisy Sales-Refills model also allows
us to handle variable refills.
6.4.3
Estimation with Noise
Estimation Idea:
We will give a brief outline of the estimation procedure for the noisy model. Our aim
is to estimate the stock at each time, from the observed sales. To do this, we will
73
look at various ‘pathways’ taken by the stock, depending on the number of items lost
in each cycle. Depending on the pathway of stock, we obtain a sequence of inter-sale
intervals which correspond to sale+refill intervals. Hence, we expect these intervals
to be longer than their neighboring itnervals. By suitably guessing the pathway of
stock, and trying to maximize the average sale+refill interval length, we obtain an
approximate estimate for the evolution of stock. The estimate is approximate, because
it is impossible to know the stock more accurately than L items. This is because the
exact time when goods are lost is not discernable within a cycle. Assuming that we
guess all the refill intervals correctly, we shall then obtain an estimate of the stock
accurate to within L items. This is the best that can be achieved by any algorithm,
under the assumptions of this model.
Limitations:
There are many limitations with the estimation process under noise, compared to the
earlier model. For example, since not all refill intervals involve waiting for a refill
(since we may have early refills), some refill intervals will be non-observable. Even if
there are no early refills, a refill interval may have a very small sale and refill time,
just by chance. In that case, it may be difficult to discern the correct refill interval.
However, such events will only occur with a small probability, and moreover correlating information from multiple cycles can help to alleviate this problem to some
extent. The correlation among adjacent cycles also leads to some robustness properties enjoyed by this model. In particular, the assumption of losses being restricted
to less than L items can be violated for a few cycles, as long as their surrounding
intervals have fewer lost items to compensate. That is, the instantaneous loss count
can be larger than L, provided the ‘average loss rate’ is upper bounded by L.
74
Consistency:
Another property, important from a theoretical point of view, is that the hidden
state in the Noisy Sales-Refills model cannot be consistently estimated from data.
The proof is straightforward. Consider 2 possibilities; in one case L items are lost
in each of
C
L
consecutive cycles; and in the other case no items are lost during the
same period (i.e.
C
L
− 1 cycles). Since the final stock at the end of this period is the
same in both cases, we cannot make any distinction between these 2 possibilities on
the basis of events outside a finite interval. By design, our model cannot distinguish
between the 2 possibilities above with 100% certainity, using only a finite amount of
data. Hence, we cannot consistently say whether the number of refills in this period
equals
C
L
or
C
L
− 1. This implies that we must have large errors (≥
C
)
2
in estimating
the stock, with non-zero probability. Hence, we say that the model is not consistently
estimable.
At first glance, this may seem like a drawback, but the lack of consistency can also
be interpreted as a feature. This is because any model which represents the inventory
process in a realistic setting, should not be consistent. To put it another way, realworld processes are noisy and do not have infinite memory. The lack of consistency
is a direct outcome of this finite-memory property of the Noisy Sales-Refills model.
Thus, this model provides a better representation of reality.
75
Chapter 7
Conclusion
7.1
Summary of contributions
We recap the key contributions described in this work.
1. We provide a method for solving the inventory estimation problem in retail
markets, using Hidden Markov Models.
2. We describe a family of Hidden Markov Models, viz. Sales-Refills models, that
can be learnt from data in a computationally and statistically efficient manner.
3. We describe computationally fast estimation algorithms for learning parameters
from data. We prove the correctness of our estimators by showing that they
converge almost surely to the true parameters.
4. We derive finite-sample error bounds for our estimators. For T observations,
these provide an upper bound on the error in stock estimation, for all time.
5. We discuss a generalized version of the Sales-Refills model, and describe estimators for this model.
6. We provide some ideas for extending our Sales-Refills model to handle noise.
76
7.2
Significance and Future Work
Our work shows the possibility of using Hidden Markov Models for accurate estimation of inventory from sales data. We provide an example of a highly learnable
HMM, for which nearly all parameters of interest can be learnt from observations,
using estimators that converge almost surely. Moreover, the estimation methods we
provide are computationally fast and statistically efficient. Our model, in addition to
solving the problem of inventory estimation, can be used to perform accurate demand
forecasting in the presence of unobserved lost sales. Thus, it can be potentially used
to solve a major problem in the retail industry.
A key drawback of our primary Sales-Refills model is the lack of ‘noise’. In
real markets, lost goods are common due to wear and tear, theft, mishandling, and
other processes, which are unobserved (cf. [7]). In order to deal with such losses,
we need to find good models which account for them. One such model is suggested
at the end of this thesis (section 6.4.1), which may work. But this model needs to
be formalized, and further error analysis is necessary, to gauge its usefulness in real
scenarios. Another important feature would be taking into account side information
about inventory, such as information obtained through inventory inspections. We
leave these to future work.
The other important contribution of this thesis is to the field of HMM learning. As
is widely known, HMM learning is a hard problem except for a few special cases. Our
work provides an instance of an HMM that is easy to describe, efficiently learnable,
and of practical utility. Moreover, we argue that this HMM is not solved by any of the
existing methods in the literature. Thus, it adds usefully to our existing knowledge
about HMMs. We provide a brief survey of a spectral method from the literature,
explaining how it can be used for HMM learning. We also list several drawbacks of this
technique illustrating that it totally fails on our specific HMM model. This reinforces
our argument that the currently known HMM learning methods are limited, and our
work is a novel addition to this set. It may be possible to expand our contributions
77
even further, by identifying the properties which make the given model learnable, and
extend these to other, more powerful models. We leave this analysis to future work.
78
Bibliography
[1] N. DeHoratius, a. J. Mersereau, and L. Schrage, “Retail Inventory Management
When Records Are Inaccurate,” Manufacturing & Service Operations Management, vol. 10, no. 2, pp. 257–277, 2008.
[2] N. DeHoratius and a. Raman, “Inventory Record Inaccuracy: An Empirical
Analysis,” Management Science, vol. 54, no. 4, pp. 627–641, 2008.
[3] K. Sari, “Inventory inaccuracy and performance of collaborative supply chain
practices,” Industrial Management & Data Systems, vol. 108, no. 4, pp. 495–
509, 2008.
[4] Y. Kang and S. B. Gershwin, “Information inaccuracy in inventory systems:
stock loss and stockout,” IIE Transactions, vol. 37, no. 9, pp. 843–859, 2005.
[5] T. W. Gruen, D. S. Corsten, and S. Bharadwaj, Retail Out of Stocks : A Worldwide Examination of Extent , Causes , and Consumer Responses. 2002.
[6] J. M. Chaneton, G. V. Ryzin, and M. Pierson, “Estimating inaccurate inventory
with transactional data,” Preprint, pp. 1–18, 2014.
[7] L. Chen, “Fixing Phantom Stockouts : Optimal Data-Driven Shelf Inspection
Policies,” Working Paper, Duke University, pp. 1–37, 2013.
79
[8] N. Agrawal and S. a. Smith, “Estimating negative binomial demand for retail
inventory management with unobservable lost sales,” Naval Research Logistics,
vol. 43, no. 6, pp. 839–861, 1996.
[9] G. Vulcano, G. van Ryzin, and R. Ratliff, “Estimating Primary Demand for Substitutable Products from Sales Transaction Data,” Operations Research, vol. 60,
no. 2, pp. 313–334, 2012.
[10] S. Nahmias, “Demand estimation in lost sales inventory systems,” Naval Research Logistics, vol. 41, no. 6, pp. 739–757, 1994.
[11] S. a. Terwijn, “On the Learnability of Hidden Markov Models,” pp. 261–268,
2002.
[12] R. B. Lyngsøand C. N. S. Pedersen, “Complexity of comparing hidden markov
models,” Lecture Notes in Computer Science (including subseries Lecture Notes
in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 2223 LNCS,
pp. 416–428, 2001.
[13] N. Abe and M. K. Warmuth, “On the Computational Complexity of Approximating Ditributions by Probabilistic Automata,” Machine Learning, vol. 9, pp. 205–
260, 1992.
[14] H. Scarf, “The optimality of (S,s) policies for the dynamic inventory problem,”
1960.
[15] S. P. Sethi and F. Cheng, “Optimality of (s, S) Policies in Inventory Models with
Markovian Demand,” Operations Research, vol. 45, no. 6, pp. 931–939, 1997.
[16] A. Anandkumar, D. P. Foster, D. Hsu, S. M. Kakade, and Y.-K. Liu, “A Spectral Algorithm for Latent Dirichlet Allocation,” Advances in Neural Information
Processing Systems, vol. 25, pp. 1–9, 2012.
80
[17] R. E. Schapire, M. Kearns, Y. Mansour, D. Ron, R. Rubinfeld, and L. Sellie,
“On the learnability of discrete distributions,” Proceedings of the twenty-sixth
annual ACM symposium on Theory of computing, pp. 273–282, 1994.
[18] Z. Zivkovic and F. van der Heijden, “Recursive unsupervised learning of finite mixture models.,” IEEE Trans. Pattern Analysis and Machine Intelligence,
vol. 26, no. 5, pp. 651–6, 2004.
[19] A. Moitra and G. Valianty, “Settling the polynomial learnability of mixtures of
Gaussians,” Proceedings - Annual IEEE Symposium on Foundations of Computer
Science, FOCS, pp. 93–102, 2010.
[20] S. Vempala and G. Wang, “A spectral algorithm for learning mixture models,”
Journal of Computer and System Sciences, vol. 68, no. 4, pp. 841–860, 2004.
[21] E. Mossel and S. Roch, Learning nonsingular phylogenies and hidden Markov
models, vol. 16. 2006.
[22] D. Hsu, S. M. Kakade, and T. Zhang, “A Spectral Algorithm for Learning Hidden
Markov Models,” 2008.
[23] A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, and M. Telgarsky, “Tensor decompositions for learning latent variable models,” arXiv preprint
arXiv:1210.7559, vol. 15, pp. 1–55, 2014.
[24] Q. Huang, R. Ge, S. Kakade, and M. Dahleh, “Minimal Realization Problems
for Hidden Markov Models,” in 52nd Annual Allerton Conference on Communication, Control and Computing, (Allerton), pp. 1–14, IEEE, 2014.
81