Blind Nonparametric Revenue Management:

advertisement
Blind Nonparametric Revenue Management:
Asymptotic Optimality of a Joint Learning and Pricing Method
Omar Besbes†
Assaf Zeevi‡
Columbia University
Columbia University
First submitted: March 29, 2006
Current version: August 29, 2006
Abstract
We consider a general class of network revenue management problems in which multiple
products are linked by various resource constraints. Demand is modeled as a multivariate
Poisson process whose instantaneous rate at each point in time is determined by a vector of
prices set by the decision maker. The objective is to price the products so as to maximize
expected revenues over a finite sales horizon. The decision maker observes realized demand over
time, but is otherwise “blind” to the underlying demand function which maps prices into the
instantaneous demand rate. Few structural assumptions are made with regard to the demand
function, in particular, it need not admit any parametric representation. We introduce a general
method for solving such blind revenue management problems: first a learning phase experiments
with a “small” number of prices over an initial “short” time interval; then a simple optimization
problem is solved using an estimate of the demand function obtained from the previous stage, and
a near-optimal price is fixed for the remainder of the time horizon. To evaluate the performance
of the proposed method we compare the revenues it generates to those corresponding to the
optimal dynamic pricing policy that knows the demand function a priori. In a regime where the
sales volume grows large, we prove that the gap in performance is suitably small; in that sense,
the proposed method is asymptotically optimal.
Keywords: Revenue management, pricing, nonparametric estimation, learning, asymptotic
optimality.
∗
Research partially supported by NSF grant DMI-0447562
Graduate School of Business, e-mail: ob2105@columbia.edu
‡
Graduate School of Business, e-mail: assaf@gsb.columbia.edu
†
1
∗
1
Introduction
Revenue management is a growing subfield of operations research which deals with modeling and
optimizing complex pricing and demand management decisions. One of the central problems in
this area is the so-called dynamic (or tactical) pricing problem, a typical instance of which is the
following: given an initial level of inventory and a finite selling horizon, dynamically price the
products being sold so as to maximize the total expected revenues. In this problem it is implicitly
assumed that there is little or no control over inventory throughout the time period over which sales
are allowed, and pricing is the main lever used to optimize profits. There are numerous examples
of revenue management problems that fall into this category, arising in industries ranging from
fashion and retail to air travel, hospitality and leisure. (For a general overview see Talluri and
van Ryzin (2005) and the survey papers by Elmaghraby and Keskinocak (2003) and Bitran and
Caldentey (2003).)
Most traditional work in the revenue management area assumes that the functional relationship
between the expected demand rate and price, often referred to as the demand function or demand
curve, is known to the decision maker, and the only form of uncertainty is due to randomness of
demand realizations; in what follows we shall refer to this as the “full information” setting. This
assumption is violated in several scenarios that one encounters in practice, where a firm might
only possess limited information with regard to the demand function. This issue is especially
relevant for fashion items and technical products with short life cycles, but it also arises in many
other instances in which limited historical sales data prohibit accurate inference of the price-demand
relationship. In such cases one can view the assumption of “full information” as more of a convenient
mathematical abstraction (which facilitates studying the structural properties of optimal pricing
policies), rather than an accurate description of the information available to the decision maker.
Relatively few studies in the revenue management literature consider uncertainty with regard to
the demand function, and most of this line of work is pursued within the context of a parametric
structure in which one or more parameters are not known; more will be said on this in the literature
review which is deferred to Section 2.
The main purpose of this paper is to generalize the traditional full information formulation
to account for significant uncertainty in the demand function, and in this context to propose
tractable solutions to the dynamic pricing problem. In doing so, we also derive qualitative insights
that pertain to the practice of “price testing” which is in widespread use in a variety of revenue
management applications.
We consider a general class of network revenue management problems in which multiple products
are linked by various resource constraints. Demand is modeled as a multivariate Poisson process
whose instantaneous rate at each point in time is determined by a vector of prices set by the
decision maker. (The Poisson process model is adopted primarily for concreteness and simplicity of
2
exposition; see the discussion in Section 6.) The objective is to price the products so as to maximize
expected revenues over a finite selling horizon. Unlike the traditional setting of the tactical pricing
problem, we assume that the decision maker does not know the true underlying demand function,
and is only able to observe realized demand over time. Few assumptions are made with regard to
the structure of the underlying demand function, in particular, it need not admit any parametric
representation. The decision maker is therefore faced with nonparametric uncertainty, and in that
sense s/he is effectively “blind” to the true model.
Consider the optimal revenues in the full information version of the tactical pricing problem
described above (note that the latter can be achieved using a dynamic pricing policy which is
derived from the underlying demand function using dynamic programming). In the blind setting we
consider, where meaningful information characterizing the demand function is absent, the following
fundamental question arises:
Do there exist blind pricing strategies that compensate for incomplete information by
learning the demand behavior “on the fly,” while guaranteeing performance that is close
to the maximum achievable in the full information setting?
Of course, even the optimal policy that is constructed in the absence of upfront knowledge of
the demand function is expected to generate significantly less revenues than its full information
counterpart.
Summary of the main results. In this paper we show that the full information benchmark
can be approached, which in a sense provides a positive answer to the question posed above. This
is done by developing a method in which estimation (demand learning) and optimization (pricing)
are executed over consecutive non-overlapping time intervals. A rough algorithmic description of
the method is as follows: i.) a “short” initial segment of the selling horizon is used to learn the
demand function by experimenting with a “small” number of prices; ii.) a deterministic relaxation
of the revenue maximization problem is formulated based on the estimated demand function; and
iii.) a fixed price solution of the optimization problem is used for the remaining selling season (see
Algorithms 1 and 2). It is important to note that the approach taken in the learning phase is
nonparametric as the true demand function need not possess any parametric structure.
With regard to the performance of these blind pricing polices, we prove that they are nearly
optimal in the following sense: the ratio of revenues generated by the proposed policies to those
achieved by the optimal dynamic pricing policy in the full information setting approaches one as the
sales volume grows large (see Theorems 1 and 2). We also derive bounds on the magnitude of the
optimality gap which hold uniformly over the class of admissible demand functions, and illustrate
the characteristics and efficacy of the method via several numerical examples. In moderate sized
problems, where inventories are of the order of hundreds or few thousands of units, our proposed
method typically achieves in excess of 70% (and may achieve as much as 90%) of the optimal full
3
information revenues
Main insights and bearing on revenue management practices. One of the main takeaway messages of our study is that the value of knowing the functional relationship between price
and demand is, perhaps somewhat surprisingly, not as as high as one may have anticipated. In
particular, by judiciously combining real-time demand learning and price optimization, the decision
maker is able to achieve close to the optimal full information revenues (despite the fact that s/he
is given little prior information with regard to the demand function).
A useful insight that arises from our analysis is related to the industry practice of “price testing,”
a prevalent method used by firms to address the lack of precise demand information; a recent
empirical study of 32 large U.S. retailers, finds that nearly 90% of them conduct price experiments
(see Gaur and Fisher (2005)). The main idea is quite straightforward and closely related in spirit
to the structure of our algorithm: in the first step one experiments with several prices; and in the
second step one selects the price/s that are expected to optimize revenues based on the demand
observed in the previous step. Among the main questions that arise in this context is how many
prices to test, and for how long. Current practices are mostly guided by ad hoc considerations in
addressing such issues. Given the significant role that price testing plays in revenue management
practices, there is a growing need to improve the understanding of this approach and add to its
rigorous foundations (see, e.g., Williams and Partani (2006) for further discussion and examples).
Our analysis contributes to this goal by providing simple and intuitive guidelines for selecting both
the number of prices that should be tested, as well as the overall fraction of the selling season that
should be dedicated to experimentation. The main takeaway here is that carefully constructed
price testing schedules extract sufficient information on the demand model, and hence enable the
firm to optimize profits over the entire selling season.
The reader may wonder whether implementing complex price testing schedules is at all feasible
in practice. The positive answer to this owes, to a large extent, to an important revolution in retail
and manufacturing industries which hinges on the Internet and the Direct-to-Customer model. This
business model, that has become synonymous with such industry giants as Dell Computers and
Amazon, facilitates the implementation of price changes and removes traditional sources of friction
(e.g., catalog printing). Recent advances in information technology and the Internet have also made
it simple to test prices across different customer accounts, where each individual account is only
able to observe the prices offered to it and not to other accounts. This greatly facilitates learning
about customer preferences and sensitivity to price, which is one of the main building blocks in
assembling a demand curve structure. (It is worth noting that the use of “testing” ideas is not
limited to prices; see Fisher and Rajaram (2000) for a study involving merchandise testing.)
Finally, an important insight that emerges from our study pertains to the risk associated with
modeling assumptions. In particular, consider the following dilemma that arises frequently in rev-
4
enue management applications: should one assume a specific parametric structure for the demand
function versus a more elaborate model. The former is simple to calibrate but runs a significant
risk of being misspecified relative to the true underlying demand function. The latter is potentially
more difficult to estimate, yet may reduce the risk of model misspecification. The current paper
demonstrates the feasibility of a learning and pricing approach that provides performance guarantees which hold regardless of the nature of the underlying demand function. This implies that the
risk of significant revenue loss due to misspecification can be effectively mitigated if one adopts
such nonparametric approaches.
The remainder of the paper. The next section provides a review of the related literature and
positions the contributions of the present paper. Section 3 introduces the model and formulates
the problem. Section 4 presents the nonparametric pricing algorithm and gives the main results
concerning its asymptotic performance. Section 5 provides illustrative numerical examples, and
Section 6 contains some concluding remarks and discussion of the modeling assumptions. All
proofs are collected in two appendices: Appendix A contains the proofs of the main results and
Appendix B contains proofs of auxiliary lemmas.
2
Related Literature and Positioning of the Paper
Blind revenue management falls into a broader category of problems which involve dynamic optimization under incomplete information. Such problems have been the topic of investigation in
Operations Management, Computer Science, Game Theory and Economics: we will next briefly
survey the most relevant studies in these streams of literature.
Bayesian approaches and parametric uncertainty. The most common framework that is
used to study dynamic optimization under incomplete information is a Bayesian modification of the
dynamic programming principle. In particular, when uncertainty is couched in parametric terms,
the approach works roughly as follows. First, a prior distribution is assumed for the unknown
parameters, and the state-space of the dynamic program is augmented to include information
accumulated over time. Then the prior is updated using the observations and the modified Bellman
equation is solved. The approach dates back to Scarf (1959) who formalized it in the context of a
periodic review inventory problem.
Since most revenue management studies assume that the demand function has a known parametric structure (for purposes of deriving qualitative insights), the Bayesian approach articulated
above has essentially been the method of choice to incorporate demand function uncertainty. Examples of such work include Aviv and Pazgal (2005) and Araman and Caldentey (2005), both of
which restrict attention to a single product problem with one unknown parameter characterizing
the demand function; see also Afèche and Ata (2005) for an application in queueing systems and
5
Keller and Rady (1999) for an example in the Economics literature.
A common problem with the Bayesian-based analysis involves the introduction of a new state
“variable” that summarizes demand information accumulated over time. This often necessitates an
approximation of the value function to avoid the curse of dimensionality. For examples of such an
analysis in the context of a single product pricing problem where the demand function is linear, see
Lobo and Boyd (2003) and Carvalho and Puterman (2005); performance of the policies proposed
in both studies are only evaluated via simulation. (For a non-Bayesian version of this problem that
uses least-squares to update parameter values see Bertsimas and Perakis (2003).)
The Bayesian formulation suffers from additional and more fundamental shortcomings. First,
and foremost, it essentially restricts the demand function to have a parametric structure. One could
argue that in most real-world problems this is a dubious hypothesis. Second, it requires as input a
prior distribution which is typically chosen to support closed form calculation of the posterior (by
restricting attention to conjugate families of distributions). This issue is perhaps less crucial in a
problem that just involves statistical estimation where eventually the effect of the prior typically
“dissipates.” In contrast, the objective function in the dynamic optimization problem is obtained
by taking an expectation relative to the prior distribution, and hence any notion of optimality
associated with a Bayesian-based policy hinges on the chosen prior. Consequently, the Bayesian
approach is significantly prone to model misspecification, a consequence of both the parametric
assumptions and the arbitrary choice of prior.
Nonparametric approaches. There are relatively few studies in the literature that focus on
nonparametric approaches. Larson et al. (2001) study a classical stochastic inventory problem where
the demand distribution is not known and show that a history dependent (s, S) policy is optimal.
In the revenue management literature, the work of van Ryzin and McGill (2000) investigates datadriven adaptive algorithms for determining airline seat protection levels. The absence of parametric
assumptions in this case is with respect to the distributions of customers’ requests for each class.
Another recent application in revenue management is discussed in Rusmevichientong et al. (2006)
who formulate a nonparametric approach to a multiproduct single resource pricing problem. Their
work focuses on a static optimization problem in which a car manufacturer seeks fixed prices that
maximize revenues based on historical preference data and does not involve tradeoffs between
learning and pricing.
While the studies above are data-driven, some recent studies have also focused on settings where
the decision maker lacks any ability to learn. Lim and Shanthikumar (2005) formulate a robust
counterpart of the single product revenue management problem of Gallego and van Ryzin (1994)
where the uncertainty arises at the level of the point process distribution characterizing the customers’ requests. The authors use a max-min formulation where nature is adversarial at every
point in time. This leads to a conservative situation which in addition is not prescriptive. (Other
6
approaches to modeling uncertainty include competitive ratio analysis and minimax regret; see for
example Eren et al. (2006) and Ball and Queyranne (2006) for references in the revenue management
literature.)
Related learning paradigms. The general problem of dynamic optimization with limited or
imperfect information has attracted significant attention in several fields; an exhaustive survey of
which is well beyond the scope of this paper. The results discussed next are, at least in spirit,
the most related to our work. In economics, a line of work that traces back to Hannan (1957)
studies settings where the decision maker faces an oblivious opponent, and the benchmark used
to evaluate a given policy is the rewards accumulated by the best possible single action, had the
decision maker known in advance the actions of the adversary. (This is somewhat related to the
full information benchmark that we employ.) See Foster and Vohra (1999) for an excellent review
of work that descends from the Hannan paper, and makes illuminating and subtle connections of
this work to other fields. Related studies in the computer science literature include Auer et al.
(2002) who propose an efficient algorithm for an adversarial version of the multi-armed bandit
problem, and Kleinberg and Leighton (2003) who discuss a revenue management application in
on-line posted-price auction. (See also Robbins (1952) and Lai and Robbins (1985).) The problem
we study, when viewed through the lens of the previous papers, can be roughly described as a
continuous time multi-armed bandit problem with a (multi-dimensional) continuum of possible
actions and limited resources. This makes the problem fundamentally different from the traditional
multi-armed bandit literature and most adversarial prediction and learning problems (cf. the text
by Cesa-Bianchi and Lugosi (2006) for a recent and comprehensive survey).
Summary. The main purpose of our work is to propose an intuitive and tractable approach
to a broad class of dynamic optimization problems under nonparametric model uncertainty, and
to provide performance guarantees for the proposed methods. To the best of our knowledge the
network revenue management problem studied here has not been addressed to date under any type
of demand function uncertainty. In a sense our work is most closely related to that of Gallego and
van Ryzin (1997) which develops provably good fixed price heuristics for a broad class of network
revenue management problems. Key to their study is the assumption that the demand function
is known to the decision maker. Our problem formulation is similar to that of Gallego and van
Ryzin (1997) but relaxes the full information assumption made there. An important element in
our algorithms is the approximation of the original revenue management problem with a suitable
deterministic counterpart; this follows closely the ideas developed in Gallego and van Ryzin (1997).
The separation of estimation and optimization so that each is performed on a different time scale
is similar to ideas appearing in Iyengar and Zeevi (2005) which studies implications of parameter
uncertainty on performance analysis and control of stochastic systems.
7
3
Problem Formulation
The model and related assumptions. We consider a revenue management problem in which
a firm sells d different products which are generated (assembled or produced) from ℓ resources.
Let A = [aij ] denote the capacity consumption matrix, whose entries aij ≥ 0, i = 1, . . . , ℓ and
j = 1, . . . , d, denote the number of units of resource i required to generate product j. It is assumed
that the entries of A are integer valued and that it has no zero column. The selling horizon is
denoted by T > 0, and after this time sales are discontinued and there is no salvage value for the
remaining unsold products. Demand for products at any time t ∈ [0, T ] is given by a multivariate
Poisson process with intensity λt = (λ1t , ..., λdt ) which measures the instantaneous demand rate (in
units such as number of products requested per hour, say). This intensity is determined by the price
vector at time t, p(t) = (p1 (t), . . . , pd (t)) through a demand function λ : Dp → Rd+ , where Dp ⊆ Rd+
denotes the set of feasible prices. Thus the instantaneous demand rate at time t is λt = λ(p(t)),
and the realized demand is a controlled Poisson process.
We will assume, unless explicitly specified otherwise, that the feasible price set Dp is a compact
convex set. Regarding the demand function λ(·) we assume that it has an inverse denoted γ(·) and
that the revenue function r(λ) := λ · γ(λ) is jointly concave; here for two vectors y, z ∈ Rd , y · z
denotes the usual scalar product and kyk := max{|y i | : i = 1, ..., d}. In addition, we assume that
the set Dλ := {l : l = λ(p), p ∈ Dp } is convex; these assumptions are quite standard in the revenue
management literature (cf. Talluri and van Ryzin (2005)).
We assume that the demand function belongs to the class of functions L := L(K, M, m, p∞ )
which satisfy the above and in addition for finite positive constants K, M , m and a vector p∞ ∈ Dp
satisfy:
i.) Boundedness of demand: kλ(p)k ≤ M for all p ∈ Dp .
ii.) Lipschitz continuity: kλ(p) − λ(p′ )k ≤ Kkp − p′ k for all p, p′ ∈ Dp .
iii.) Minimum revenue rate: max{p · λ(p) : p ∈ Dp } ≥ m.
iv.) “Shut-off” price: λ(p∞ ) = 0.
Assumptions i.)-iii.) are quite benign and hold for many demand function models used in the
revenue management literature such as linear, exponential and iso-elastic (Pareto), as long as the
parameters lie in a compact set; see, e.g., Talluri and van Ryzin (2005, §7) for further examples.
The existence of a “shut-off” price in Assumption iv.) is not restrictive from a practical standpoint
since in most applications there exists a finite price that yields zero demand. The assumption that
p∞ ∈ Dp is also not overly restrictive. To see that, consider the case where the demand function
is separable and λj (p) = f j (pj ) for j = 1, ..., d, where the functions f j (·) are non-increasing and
8
f j (pj∞ ) = 0. Putting γ j (l) = inf{p ≥ 0 : λj (p) ≤ l}), if we assume that yγ j (y) are concave for
j = 1, . . . , d, then iv.) is clearly satisfied.
Information structure and the optimization problem. Let (p(t) : 0 ≤ t ≤ T ) denote the
price process which is assumed to have sample paths that are right continuous with left limits taking
values in Dp . Let (N 1 (·), ..., N d (·)) be a vector of mutually independent unit rate Poisson processes.
Rt
The cumulative demand for product j up until time t is then given by D j (t) := N j ( 0 λj (p(s))ds).
We say that (p(t) : 0 ≤ t ≤ T ) is non anticipating if the value of p(t) at time t ∈ [0, T ] is only allowed
to depend on past prices {p(s) : s ∈ [0, t)} and demand values {(D 1 (s), . . . , Dd (s)) : s ∈ [0, t)}.
(More formally, the price process is adapted to the filtration generated by the past values of the
demand process and price process.)
We assume that the decision maker does not know the true demand function, and s/he only
knows that λ ∈ L. The decision maker is able to continuously observe realized demand at all time
instants starting at time 0 and up until the end of the selling horizon T . We shall use π to denote
a joint learning and pricing policy that maps the above information structure to a non anticipating
price process (p(t) : 0 ≤ t ≤ T ). With some abuse of terminology, we will use the term policy to
refer to the price process and algorithm that generates it interchangeably. For 0 ≤ t ≤ T put
Z t
j,π
j
λj (p(s))ds , for j = 1, . . . , d,
(1)
N (t) := N
0
where
N j,π (t)
denotes the cumulative demand for product j up to time t under the policy π. Let
N π (t) denote the vector (N 1,π (t), ..., N d,π (t)).
Let x = (x1 , x2 , ..., xℓ ) denote the inventory level of each resource at the start of the selling
season. We assume without loss of generality that xi > 0, i = 1, ..., ℓ. A joint learning and pricing
policy π is said to be admissible if the induced price process satisfies
Z T
AdN π (s) ≤ x
a.s.,
(2)
0
p(s) ∈ Dp , 0 ≤ s ≤ T,
(3)
where A is the capacity consumption matrix defined earlier and vector inequalities are assumed
to hold componentwise. It is important to note that while the decision maker does not know the
demand function, knowledge of p∞ guarantees that the constraint (2) can be met.
Let P denote the set of admissible learning and pricing policies. The dynamic optimization
problem faced by the decision maker under the information structure described above would be:
choose π ∈ P to maximize the total expected revenues
hZ T
i
π
J (x, T ; λ) := E
p(s) · dN π (s) .
(4)
0
There is of course a glaring defect in the above objective: the decision maker is not able to compute
the expectation in (4), and hence to evaluate the performance of any proposed policy, since the true
9
demand function governing customer requests is not known a priori. This lends further meaning to
the terminology “blind revenue management,” as the decision maker is attempting to optimize (4)
in a blind manner. We will revisit the objective of the decision maker shortly.
The full information benchmark. When the decision maker knows the demand function λ
prior to the start of the selling season, the dynamic optimization problem described in (4) can, at
least in theory, be computed; this will be referred to as the full information case. This problem is
precisely the one formulated in Gallego and van Ryzin (1997) who also characterize the optimal
state-dependent pricing policy using dynamic programming. Suppose that we fix an admissible
demand function λ ∈ L. Let Pλ denote the class of policies that “know” the demand function
prior to the start of the selling season, and whose corresponding price process is non anticipating
and satisfies the admissibility conditions (2)-(3) (in other words, when compared to policies in P,
policies in Pλ are allowed to depend on the true value of the demand function λ). Let us define
hZ T
i
∗
J (x, T |λ) := sup E
p(s) · dN π (s) ,
(5)
π∈Pλ
0
where the presence of λ on the left hand side in (5) reflects the fact that the optimization problem is
solved “conditioned” on knowing the demand function a priori. The regularity conditions imposed
earlier in this section guarantee that there exists a Markovian policy in Pλ that achieves the supremum and hence is optimal (see Gallego and van Ryzin (1997) for details). The term Markovian
reflects the fact that the policy will only depend on past demand observations through the current
state of inventory.
Performance measures and the main objective. Clearly the value of the full information
optimization problem (5) will serve as an upper bound on the value of the original optimization
problem described in (4). That is, for any fixed demand function λ ∈ L we have that
J π (x, T ; λ)
≤ 1 for all admissible policies π ∈ P.
J ∗ (x, T |λ)
This ratio measures the performance of any admissible policy on relative scale, expressed as a
fraction of the optimal revenues that are achieved with the aid of an oracle that knows the true
demand function. One would anticipate the percent loss relative to the full information case to
be quite large and the above ratio to be bounded strictly away from 1. As mentioned in the
introduction, the main question that this paper addresses is the following:
Do there exist policies π ∈ P which do not know the demand function yet by learning it “on the fly” can achieve near optimal revenues in the sense that J π (x, T ; λ) ≈
J ∗ (x, T |λ)?
Before attempting to answer this question it is necessary to rule out trivial solutions. In particular, one possible strategy which is not precluded by policies in P is to “guess” the demand function
10
at time t = 0, and then proceed to optimize the objective using this guess. If the guess is correct,
this policy can generate the full information optimal revenues J ∗ (x, T |λ) by following the optimal
Markovian pricing scheme. Of course this type of guessing policy will almost never be adopted in
practice. One way to exclude it mathematically is to require that the relative performance of a
joint learning-pricing policy π ∈ P be measured with respect to the worst possible demand function
in the class L, namely
J π (x, T ; λ)
.
λ∈L J ∗ (x, T |λ)
inf
(6)
The criterion in (6) can be viewed as the result of a two step procedure: first the decision maker
selects a policy π ∈ P, and then “nature” picks the worst possible demand function λ ∈ L for
this particular policy. Measuring performance in this manner guarantees that a “good” policy will
perform well regardless of the true underlying demand function. In addition, the decision maker’s
objective is now well posed, as s/he can, at least in theory, find the “worst” demand function
λ ∈ L corresponding to “each” policy. The fact that such policies can only learn the true demand
function by observing realized demand over time introduces an obvious tension between exploration/estimation (demand learning) and optimization (pricing). Returning to the main question
posed above, our interest is in constructing suitable learning and pricing policies that make the
ratio in (6) large (and ideally close to one).
4
Main Results
The nonparametric pricing algorithm. Before introducing the algorithm we need to define a
Q
price grid. Let Bp := di=1 [pi , pi ] denote the minimum volume hyper-rectangle in Rd+ , such that
k
j
Bp ⊇ Dp . Given a positive integer κ, one can divide each interval [pi , pi ], i = 1, ..., d into κ1/d
intervals of equal length. Define the resulting grid of points in Rd+ as Bpκ . Let e = (1, . . . , 1) ∈
Rℓ . The following algorithm defines a class of admissible learning and pricing policies that are
parametrized by a triplet of tuning parameters (τ, κ, δ), where τ ∈ (0, T ], κ is a positive integer
and δ > 0.
Algorithm 1:
π(τ , κ, δ)
Step 1. Initialization:
(a) Set the learning interval to be [0, τ ], and set κ to be the number of prices to experiment
with. Put ∆ = τ /κ.
(b) Define P κ = {p1 , ..., pκ } to be the prices to experiment with over [0, τ ], where P κ ⊇
Bpκ ∩ Dp .
11
Step 2. Learning/experimentation:
(a) On the interval [0, τ ] apply pi from ti−1 = (i − 1)∆ to ti = i∆, i = 1, 2, ..., κ as long
as inventory is positive for all resources. If some resource is out of stock, apply p∞ up
until time T and STOP.
(b) Compute
ˆ i ) = total demand over [ti−1 , ti ) ,
d(p
∆
i = 1, ..., κ.
(7)
Step 3. Optimization:
For i = 1, ..., κ,
ˆ i )T ≤ x + δe, then
If Ad(p
ˆ i)
r̂(pi ) = pi · d(p
else r̂(pi ) = 0.
End If
End For
Set p̂ = arg max{r̂(p) : p ∈ P κ }.
Step 4. Pricing:
On the interval (τ, T ] apply p̂ until some resource is out of stock, then apply p∞ for the
remaining time.
We will describe the intuition behind the construction of Algorithm 1 in the discussion following
Theorem 1. Regarding Step 4, it is clear that any practical implementation of this algorithm would
not “shut off” all the demand once only one resource becomes unavailable but would rather do so
only for those products that use the unavailable resource. The result we present in Theorem 1 is
valid for policies that improve upon the above algorithm by refining Step 4 through partial and/or
gradual demand “shut off.”
Asymptotic analysis. We consider a regime in which both the number of initial resources
(or capacity) as well as potential demand grow proportionally large. In particular for any positive
integer n, the initial resource vector is now assumed to be xn = nx and the demand function is
λn (·) = nλ(·). Thus, n determines both the order of magnitude of inventories and the rate of
demand; when n is large this scaling characterizes a regime with a high volume of sales. We will
denote by Pn the set of admissible policies for a system with scale n, and the expected revenues
12
under a policy πn ∈ Pn will be denoted Jnπ (x, T ; λ). With some abuse of notation we will occasionally use π to denote a sequence {πn : n = 1, 2, . . .} as well as any element of the sequence, omitting
the subscript “n” to avoid cluttering the notation. For each n = 1, 2, . . ., let Jn∗ (x, T |λ) denote the
optimal revenues that can be achieved in the full information case, i.e., when the demand function
is known a priori. Of course for all n = 1, 2, . . ., we have that Jnπ (x, T ; λ) ≤ Jn∗ (x, T |λ). With
this in mind, the following definition characterizes admissible policies that have “good” asymptotic
properties.
Definition 1 (Asymptotic Optimality) A sequence of admissible policies πn ∈ Pn is said to be
asymptotically optimal if
Jnπ (x, T ; λ)
→1
∗ (x, T |λ)
λ∈L Jn
inf
as n → ∞.
(8)
Asymptotically optimal policies are those that achieve the full information upper bound on revenues
as n → ∞, uniformly over the class of admissible demand functions.
For the purpose of asymptotic analysis we use the following notation: for real valued positive
sequences {an } and {bn } we write an = O(bn ) if an /bn is bounded from above for large enough
values of n (i.e., lim sup an /bn < ∞). If an /bn is also eventually bounded away from zero (i.e.,
lim inf an /bn > 0) then we write an ≍ bn .
Theorem 1 There exists a sequence of policies {π(τn , κn , δn )} defined by Algorithm 1 that is
asymptotically optimal.
Remark 1 (rates of convergence) The proof of the above result is constructive in that it exhibits a set of tuning parameters such that πn = π(τn , κn , δn ) is asymptotically optimal. In particular, for
τn ≍ n−1/(d+3) ,
κn ≍ nd/(d+3) ,
δn = Cn(log n)1/2 n−1/(d+3) ,
with C > 0 sufficiently large, the convergence rate in (8) is given by
(log n)1/2 J π (x, T ; λ) sup 1 − n∗
as n → ∞,
=O
Jn (x, T |λ)
n1/(d+3)
λ∈L
(9)
(10)
where d denotes the number of products.
The existence of this sequence of policies states that the value of full information diminishes
for large n. The choice of tuning parameters in (9) optimizes the rate of convergence, as can be
seen from the last step in the proof. Note that τn is shrinking as n gets large, and in that sense
the learning horizon is “short” relative to the sales horizon [0, T ]. Ignoring logarithmic terms, we
can rewrite (10) informally as Jnπ (x, T ; λ)/Jn∗ (x, T |λ) ≈ 1 − Cτn /T . Hence the loss in revenues is
proportional to the relative size of the learning horizon.
Intuition and proof sketch. Step 1 in Algorithm 1 consists of setting the first two tuning
parameters: τn determines the length of interval used for learning the demand function; and κn
13
sets the number of prices that are experimented with on that interval. In Step 2, prices in the set
P n are used to obtain an approximation of the demand function, which as n grows large becomes
increasingly accurate due to the strong law of large numbers.
To understand Step 3, imagine that the demand function λ(·) was known a priori, and demand
was deterministic rather than governed by a Poisson process. The revenue maximization problem
would then be given by the following deterministic dynamic optimization problem
Z T
o
nZ T
Aλ(p(s))ds ≤ x , p(s) ∈ Dp for all s ∈ [0, T ] .
max
r(λ(p(s)))ds :
(11)
0
0
Gallego and van Ryzin (1997) show that the solution to (11) is constant over time. Their work
also establishes that this fixed price yields close to optimal performance when used in the original
stochastic system (in the asymptotic regime under consideration). This sheds some light on the
optimization problem articulated in Step 3 of the algorithm: based on the observations one forms an
estimate of the revenue function and then proceeds to solve an empirical version of the deterministic
problem (11), using this solution for the remainder of the time horizon. The choice of the tuning
parameter δn allows constraints to be violated, taking into account the estimation “noise.” This
avoids restricting too drastically the set of candidate prices in Step 4.
The choice of the key tuning parameters, τn and κn , is meant to balance several contradicting
objectives. Increasing τn results in a longer time horizon over which the demand function is estimated, however, by doing so there is also a potential loss in revenues that stems from spending
“too much time” on learning and exploration. In addition, for every fixed choice of τn , there is an
inherent tradeoff between increasing the number of prices to experiment with, κn , and the accuracy
of estimating the demand function on this price grid which is dictated by the length ∆n =τn /κn .
We now analyze how these parameters impact the ratio Jnπ /Jn∗ . The first error source can be
interpreted as an “exploration bias” that is due to experimenting with various prices without using
any information about the demand. This will result in potential losses of order τn . The second
error source is deterministic and stems from using using only a finite number of prices to search
for the optimal solution of (11) (since the demand function is assumed to be K-Lipschitz). The
1/d
maximal loss related to this error is of order 1/κn . The last source of error is stochastic, arising
from the fact that only noisy observations of the demand function are available. Since each price
is held fixed for ∆n = τn /κn units of time, this introduces an error of order (nτn /κn )−1/2 ; this
observation is less transparent and is rigorously detailed in the proof using uniform probability
bounds for deviations of random variables from their expectation. The overall error is simply the
sum of the three sources detailed above, namely
1−
Jnπ /Jn∗
1/2 1
κn
≈ C τn + 1/d +
.
(nτn )1/2
κn
(12)
This last expression captures mathematically the tension that must be resolved in choosing the
14
tuning parameters associated with Algorithm 1. Balancing the three error terms in (12) yields the
rate of convergence in (10).
Price restricted case. Algorithm 1 and the proof of its asymptotic optimality rely heavily on
the fact that the set of feasible prices Dp is convex. In many applications, this assumption is not
satisfied and the set of feasible prices may be discrete, e.g., Dpk = {p1 , ..., pk }. (The reasons for such
constraints can be industry practices, to competition, etc.) We continue to define P as the set of
admissible learning and pricing policies when the demand function is unknown, while Pλ denotes
the set of pricing policies when λ is known a priori. Here Jn∗ (x, T |λ) is to be interpreted as the
optimal revenues in the full information case when the prices are restricted to the discrete price set
Dpk . Under the following technical condition, a similar result to Theorem 1 can be established here.
Assumption 1 There exists a constant m1 > 0 such that any function λ ∈ L is bounded below
by m1 > 0 on Dpk \ {p∞ }, i.e., inf λ∈L {minp∈Dpk \{p∞ } {λj (p) : j = 1, . . . , d}} > m1 .
We introduce below an algorithm for the multiproduct price restricted case. The intuition behind
its construction is similar to the one underlying Algorithm 1. What distinguish it from the latter
are the following: i) there are only two tuning parameter (τ, δ) since the set of feasible prices is
discrete; and ii) the deterministic problem that is solved in Step 3 can be formulated as a linear
program whose solution prescribes the amount of time that each price is used.
Algorithm 2:
π(τ , δ)
Step 1. Initialization:
Set the learning interval to be [0, τ ] and put ∆ = τ /k.
Step 2. Learning/experimentation:
(a) On the interval [0, τ ] apply pi from ti−1 = (i − 1)∆ to ti = i∆, i = 1, 2, ..., k as long
as inventory is positive for all resources. If some resource is out of stock, apply p∞ up
until time T and STOP.
(b) Compute
ˆ i ) = total demand over [ti−1 , ti ] ,
d(p
∆
i = 1, ..., k.
(13)
Step 3. Optimization: Let t̂ = (t̂1 , ..., t̂k ) be the solution of the linear program
max
k
nX
i=1
pi · dˆi ti :
k
X
Adˆi ti ≤ x − Aeδ,
k
X
i=1
i=1
15
ti ≤ T − τ, ti ≥ 0, i = 1, ..., k
o
(14)
Step 4. Pricing:
For each i = 1, ..., k, apply pi for t̂i time units on (τn , T ] until some resource is out of stock,
then apply p∞ for the remaining time.
We now analyze the performance of Algorithm 2 in the context of the asymptotic regime introduced earlier in this section.
Theorem 2 Suppose that the set of prices is Dpk = {p1 , ..., pk } and let Assumption 1 hold. Then
there exists a sequence of policies {π(τn , δn )} defined by Algorithm 2 that is asymptotically optimal.
Remark 2 (rates of convergence) Setting τn ≍ n−1/3 and δn = Cn(log n)1/2 n−1/3 with C > 0
sufficiently large, we get that
(log n)1/2 J π (x, T ; λ) sup 1 − n∗
=O
Jn (x, T |λ)
n1/3
λ∈L
as n → ∞.
(15)
We note that, in contrast with (10), the rate of convergence in this case does not depend on the
number of products (d).
5
Illustrative Numerical Examples
Note that Jn∗ (x, T |λ) is not readily computable in most cases, however an upper bound is easy
to obtain through the value of the deterministic optimization problem given in (11). In addition
this upper bound is fairly tight for moderate sized problems (see Gallego and van Ryzin (1997)),
hence one can compute a “good” lower bound on the ratio Jnπ (x, T ; λ)/Jn∗ (x, T |λ) based on this
deterministic relaxation. The results for learning and pricing policies πn depicted in the Tables
1-4 are based on running 500 independent simulation replications from which the performance
indicators were derived by averaging. The standard error for estimating Jnπ (x, T ; λ)/Jn∗ (x, T |λ) was
below 0.5% in all cases.
Example 1 (Single product example) We start with a single product example, i.e., d = m = 1,
and the matrix A is just a scalar equal to 1 (i.e., the inventory is counted in units of this product).
We consider two underlying demand models: an exponential and a linear model. The parameters
used to generate the results depicted on Table 1 are as follows: λ(p) = 10 exp(1 − p). In Table 2,
we took λ(p) = (10 − 2p)+ . For both cases, the set of feasible prices was taken to be Dp = [0.1, 6]
and the time horizon was taken to be T = 1.
In both tables, we present the ratio of the performance of the policy π defined through Algorithm
1 to the optimal performance in the full information case for three problem sizes (n) and seven
16
normalized inventory sizes (x). In each case, we indicate the number of prices used by the policy
κ, the time dedicated to learning τ , and the proportion of initial products sold during the learning
phase α. (The tuning parameters are defined via (9) with C = 1 for each n.)
Problem “size”
n = 102
n = 103
n = 104
Tuning parameters
κ = 5, τ = 0.31
κ = 7, τ = 0.17
κ = 12, τ = 0.10
Jnπ /Jn∗
α
Jnπ /Jn∗
α
Jnπ /Jn∗
α
3
.42
84
.69
39
.91
18
5
.61
51
.78
23
.95
11
7
.78
36
.68
17
.85
8
9
.74
28
.90
13
.94
6
11
.74
23
.90
11
.93
5
13
.74
18
.90
9
.93
4
15
.74
17
.90
8
.93
4
x
Table 1: Exponential demand function. Jnπ /Jn∗ represents a lower bound on the ratio of the performance of the policy π(τ, κ, δ) to the optimal performance in the full information case, and x is the
normalized inventory level. Here κ = number of prices used by the policy π; τ = fraction of time
allocated to learning; and α = proportion of inventory sold during the learning phase (in %).
Problem “size”
n = 102
n = 103
n = 104
Tuning parameters
κ = 5, τ = 0.31
κ = 7, τ = 0.17
κ = 12, τ = 0.10
Jnπ /Jn∗
α
Jnπ /Jn∗
α
Jnπ /Jn∗
α
2
.45
85
.72
44
.83
23
3
.59
57
.73
29
.87
15
4
.77
42
.90
22
.89
11
5
.83
33
.95
18
.95
9
6
.83
28
.88
15
.95
8
7
.83
24
.88
12
.95
7
8
.82
21
.88
11
.95
6
x
Table 2: Linear demand function. Jnπ /Jn∗ represents a lower bound on the ratio of the performance of
the policy π(τ, κ, δ) to the optimal performance in the full information case, and x is the normalized
inventory level. Here κ = number of prices used by the policy π; τ = fraction of time allocated to
learning; and α = proportion of inventory sold during the learning phase (in %).
17
We observe that with inventory levels of the order of a few thousands, the expected revenues
under the proposed policy are close to 90% of the expected revenues in the full information case
where knowledge of the demand function enables us to optimally solve the dynamic pricing problem.
Note that for inventories of the order of thousands, the policy utilizes approximately 17.7% of the
time horizon T to learn the demand function and experiments with only 7 prices. It might also seem
surprising that the performance of the algorithm does not necessarily improve with the inventory
size. For example, the entries for x = 7 and x = 9 in Table 1 show that the performance of
the algorithm is relatively better in the first case (.78 versus .74 for n = 102 ). When analyzing
these results, one should keep in mind that both the numerator and the denominator in the ratio
Jnπ /Jn∗ vary and hence there is no particular reason to expect a monotonic behavior of the ratio.
In addition, as the initial inventory changes the price sought after in Step 3 of Algorithm 1 varies
and might be further away from one of the points contained in the price grid Pnκn .
Example 2 (Network example) We consider now an example with two products and three
resources where the underlying demand is separable. In particular we use the following demand
model: λ(p1 , p2 ) = (5 − p1 , 9 − 2p2 )′ . The set of feasible prices is Dp = [0.5, 5.5] × [0.5, 5] and T = 1.
The first, second and third rows of the capacity consumption matrix A are given by (1, 1), (3, 1)
and (0, 5) respectively. For example, this means that product 1 requires 1 unit of resource 1, 3
units of resource 2 and no units of resource 3. In Table 3, we give performance results for the policy
defined by Algorithm 1 (with tuning parameters given in (9) with C = 1 for each n).
Problem “size”
n = 102
n = 103
n = 104
Tuning parameters
κ = 10, τ = 0.40
κ = 17, τ = 0.25
κ = 50, τ = 0.16
Jnπ /Jn∗
Jnπ /Jn∗
Jnπ /Jn∗
( 3, 5, 7)
.64
.72
.73
(15, 5, 7)
.64
.73
.74
(15, 8, 7)
.55
.77
.72
(15, 8, 30)
.71
.81
.88
(15, 12, 30)
.65
.86
.90
x
Table 3: Network example with linear demand function. Jnπ /Jn∗ represents a lower bound on the
ratio of the performance of the policy π(τ, κ, δ) to the optimal performance in the full information
case, and x is the normalized inventory level. Here κ = number of prices used by the policy π; τ =
fraction of time allocated to learning.
Contrasting with the single product example we clearly see the effects of dimensionality coming
into play. For problem sizes of the order of 103 , the number of price points tested grows from 7 in
the single product case to 17 in the two product case. In conjunction, the time allocated to learning
18
increases from 17% of the selling season to 25% of the selling season. Focusing on performance, we
note that for problems whose size is of the order of thousands, the performance of the proposed
policy exceeds 72% of the optimal full information revenues for all intial inventory vectors tested.
Example 3 (Network example in the price restricted case) The parameters of the demand
function are taken to be as in Example 2, however now the set of prices is restricted to the finite
set Dp5 = {(0.5, 3), (0.5, 0.5), (1.1, 2), (4, 4), (4, 6.5)}. In Table 4, we illustrate the performance of
the policies defined by Algorithm 2 with τn = n−1/3 and δn = (5 log n)1/2 n−1/3 .
Problem “size”
n = 102
n = 103
n = 104
Tuning parameters
τ = 0.22
τ = 0.10
τ = 0.05
Jnπ /Jn∗
Jnπ /Jn∗
Jnπ /Jn∗
( 3, 5, 7)
.41
.71
.93
(15, 5, 7)
.41
.72
.93
(15, 8, 7)
.72
.87
.95
(15, 8, 30)
.82
.91
.95
(15, 12, 30)
.83
.92
.96
x
Table 4: Network example (price restricted case) with linear demand function. Jnπ /Jn∗ represents a
lower bound on the ratio of the performance of the policy π(τ, δ) to the optimal performance in the
full information case, and x is the normalized inventory level. Here τ = fraction of time allocated
to learning.
The results in Table 4 show that the policies consistently exceed 93% of the full information
benchmark for n = 104 , illustrating the faster convergence rate claimed in Remark 2 (in comparison
to that in Remark 1). It is also interesting to note that such performance is reached by only
allocating 5% of the selling horizon to the learning phase.
6
Concluding Remarks
The curse of dimensionality and efficiency of the algorithms. When prices are restricted
to a discrete set then the asymptotic performance of Algorithm 2, measured in terms of the rate
of convergence given in Remark 2, is independent of the number of products being priced. In
contrast, when the price set is not discrete then one needs to experiment with sufficiently many price
combinations to “cover” the domain of the unknown demand function. This approach suffers from
the curse of dimensionality, evident in the rate guarantee given in Remark 1 (following Theorem
1) which degrades as the number of products d increases. Numerical results in Section 4 clearly
illustrate the difference between the price restricted and unrestricted case with regard to these
19
dimensionality effects. This problem will persist in any scheme that involves static sampling of the
price domain, and one would have to resort to adaptive methods in order to improve performance.
If one restricts the class of demand functions by imposing further smoothness assumptions, then
it is possible to “mitigate” the curse of dimensionality. Essentially, as smoothness increases one
needs fewer points to construct a good approximation of the demand function. This direction is
very appealing from a practical implementation perspective.
On the Poisson process assumption. We have made the assumption that requests for products arrive according to a Poisson process whose rate is the underlying demand function (evaluated
at the given price). As stated already in the introduction, this assumption is made primarily for
concreteness and in order to keep technical details to a bare minimum. In essence, the notion of
asymptotic optimality we advocate in this paper only relies on a rough estimate of the rate of
convergence in the strong law of large numbers. Thus, parallel results to the ones given in Section
4 can be derived under far more general assumptions on the underlying point process that governs
demand.
Extensions. Our approach hinges on the fact that the revenue management problem being
discussed can be “well approximated” by an appropriate deterministic relaxation which admits
a simple solution. This is encoded into Step 3 of almost all algorithms described in this paper.
Roughly speaking, this ensures that a static fixed price nearly maximizes revenues in the full
information case (cf. Gallego and van Ryzin (1994, 1997)). Problems that admit such structure
appear in numerous other contexts (for examples that focus on pricing in service systems see, e.g.,
Paschalidis and Tsitsiklis (2000) and Maglaras and Zeevi (2005)), hence the techniques developed
in this paper may prove useful in those problems as well.
Adaptive algorithms. One aspect that has not been addressed in the present paper is that
of implementation. Even though the performance of the proposed algorithms was shown to be
near-optimal, the decision maker will not necessarily wish to fully separate the learning phase
from the pricing phase in the manner prescribed by Algorithms 1 and 2. In particular, it may
be appealing to make the learning adaptive so that only relevant regions of the feasible price set
are explored in the experimentation phase. That is, the estimation and optimization stages might
be pursued simultaneously rather than sequentially. Adaptive schemes can be used to perform a
better localized search for the near optimal fixed price, and can also exploit further smoothness
assumptions characterizing the unknown demand function to reduce the optimality gap.
Exploiting parametric assumptions. Consider a scenario in which the demand function has
a known parametric structure, with parameter values that are unknown to the decision maker.
The obvious question here is whether one can construct algorithms which exploit this information
and achieve better performance, relative to the nonparametric methods studied in this paper. The
20
main point to consider here is that any algorithm that relies on parametric assumptions involves
significant model risk, as the true demand function may not (and most likely will not) belong to a
parametric family. Addressing the above question would provide a means for rigorously quantifying
the “price” that one pays for eliminating model misspecification risk.
Treating time-inhomogeneities. The question here is the following: how should a firm jointly
learn and price when the demand function is unknown and time dependent. It stands to reason
that if one wants to capture a rich time inhomogeneous structure, then one would need to resort
to nonparametric approaches. The method developed in this paper hopefully provides a first step
in this direction.
A
Proofs of Main Results
Notation. In what follows, if x and y are two vectors, x 6≤ y if and only if xi > yi for at least
one component i; x+ will denote the vector in which the ith component is max{xi , 0}. We define
ā := max{aij : 1 ≤ i ≤ m, 1 ≤ j ≤ d}, where aij are the entries of the capacity consumption matrix
A. Ci , i ≥ 1 will denote positive constants which are independent of a given demand function, but
may depend on the parameters of the class of admissible demand functions L and on A, x and T .
Recall that e denotes the vector of ones in Rℓ . For a sequence {an } of real numbers, we will say
it converges to infinity at a polynomial rate if there exist β > 0 such that lim inf n→∞ an /nβ > 0.
With some abuse of notation, for a vector, y ∈ Rd+ and a d-vector of unit rate Poisson processes
N (·), we will use for N (y) to denote the vector with ith component N i (y i ), i = 1, . . . , d. Finally
Comment 1. Recall the definition of problem (4). Since Dp is bounded, the price charged for any
product never exceeds, say M̄ . Consider a system where backlogging is allowed in the following
sense: for each unit of resource backlogged the system incurs a penalty of M̄ . Recall that A is
assumed to be integer valued with no zero column, and hence anytime the new system receives a
request such that no sufficient resources are available to fulfill it, a penalty of at least M̄ is incurred.
Consider any admissible policy π that applies p∞ for the remaining time horizon as soon as one
resource is out of stock. (Note that all the policies introduced in the main text are of this form.)
Since M̄ exceeds the price that the system receives, the expected revenues of such a policy π in
the original system J π (x, T ; λ) are bounded below by the ones in the new system (note that in the
latter, π does not apply p∞ if the system runs out of any resource).
Comment 2. We will denote by J D (x, T |λ) the optimal value of the deterministic relaxation (11).
First note that JnD = nJ D . We will also use the fact that
inf J D (x, T |λ) ≥ mD ,
λ∈L
where mD = mT ′ > 0, and T ′ = min{T, min1≤i≤ℓ xi /(āM d)}. Indeed, for any λ ∈ L, there is a
21
price q ∈ Dp such that r(q) ≥ m. Consider the policy that applies q on [0, T ′ ] and then applies
p∞ up until T . This solution is feasible since Aλ(q)T ′ ≤ dāM T ′ e ≤ x. In addition the revenues
generated from the policy above are given by mT ′ .
Proof of Theorem 1. Fix λ ∈ L. For simplicity, we restrict attention to the product set
Q
Dp = di=1 [pi , pi ]. Let M̄ = max1≤i≤d pi be the maximum price a customer will ever pay for a
product. It is easy to verify that the deterministic optimization problem given (11) is a convex
problem whose solution is given by a constant price vector p̃ (cf. Gallego and van Ryzin (1997)).
Let π be the policy defined by means of Algorithm 1.
Step 1. We first focus on the the learning and optimization phases. Let τn be such that
τn = o(1) and nτn → ∞ at a polynomial rate. Let κn be a sequence of integers such that κn → ∞
k
j
1/d
and n∆n := nτn /κn → ∞ at a polynomial rate. Divide each interval [pi , pi ], i = 1, ..., d into κn
k
j
1/d d
hyper
equal length intervals and consider the resulting grid in Dp . The latter has κ′n = κn
rectangles. For each one, let pi be the largest vector (where the largest vector of a hyper rectangle
Qd
κ′n = {p , p , ..., p ′ }. Note that
1 2
κn
i=1 [ai , bi ] is defined to be (b1 , ..., bd )) and consider the set P
κ′n /κn → 1 as n → ∞ and with some abuse of notation, we use both κn and κ′n interchangeably.
Now partition [0, τn ] into κn intervals of length ∆n and apply the price vector pi on the ith
interval. Define
P
P
N n∆n ij=1 λ(pj ) − N n∆n i−1
j=1 λ(pj )
b i) =
λ(p
n∆n
,
i = 1, ..., κn ,
b i ) denotes the number of requests
where N (·) is the d-vector of unit rate Poisson processes. Thus λ(p
for each product over successive intervals of length ∆n , normalized by n∆n .
We now choose the “best” price among “almost feasible prices.” Specifically, we let δn =
1/d
C1 (log n)1/2 max{1/κn , (n∆n )−1/2 } where C1 > 0 is a design parameter to be chosen later. and
b i ) if Aλ(p
b i )T ≤ x + eδn ; otherwise set r̂(pi ) = 0. The objective of this step is to
set r̂(pi ) = pi · λ(p
discard solutions of the deterministic problem which are essentially infeasible. (The slack term δn
allows for “noise” in the observations.) Let
p̂ = pi∗
where i∗ = arg max{r̂(pi ), i = 1, ..., κn }.
(A-1)
Step 2. Here, we derive a lower bound on the expected revenues under the policy π. We will
need the following two lemmas whose proofs are deferred to Appendix B.
Lemma 1 Fix η > 0. Suppose that µj ∈ (0, M ), j = 1, ..., d. and rn = nβ with β > 0. Then, if
−1/2
ǫn = C(η)(log n)1/2 rn
with C(η) = 2dη 1/2 āM 1/2 , then the following holds
C
2
P A(N (µrn ) − µrn ) 6≤ rn ǫn e ≤ η ,
n
C
2
P A(N (µrn ) − µrn ) 6≥ −rn ǫn e ≤ η ,
n
22
where C2 > 0 is an appropriately chosen constant.
From now on, we fix η ≥ 2 and C1 = 2 max{1, p̄}C(η). Using the previous lemma, we have the
following
b i )T ≤ x + δn e}. Then for a suitably large constant C3 > 0
Lemma 2 Let Pfn = {pi ∈ P κn : Aλ(p
(L)
Pκ n
C
3
P r(p̃) − r(p̂) > δn ≤ η ,
n
C
3
P p̂ ∈
/ Pfn ≤ η .
n
=
(L)
i=1 λ(pi )n∆n ,
(P )
(L)
(P )
Xn
(P )
= λ(p̂)n(T −τn ) and put Yn = AN (Xn +Xn ). In the rest
P n b
(L)
(P )
(L)
Aλ(pi )n∆n +AN (Xn +Xn )−AN (Xn )
of the proof, we will use the fact that given p̂, Yn = κi=1
We define Xn
(L)
(P )
and that N (Xn + Xn ) − N (Xn ) has the same distribution as N (Xn ). Recalling Comment 1
in the preamble of the appendix, note that Yn is the total potential demand (for resources) under
π if one would never use p∞ and that one can lower bound the revenues under π as follows
h
i
h
+ i
Jnπ ≥ E p̂ · [N (Xn(L) + Xn(P ) ) − N (Xn(L) )] − M̄ e · E Yn − nx
.
(A-2)
The first term on the RHS of (A-2) can be bounded as follows
h
i
E p̂ · [N (Xn(L) + Xn(P ) ) − N (Xn(L) )]
ii
h h
= E E p̂ · N (λ(p̂)n(T − τn )) p̂
h
i
= E r(p̂) n(T − τn )
(
i h
(a)
= r(p̃) + E r(p̂) − r(p̃) r(p̂) − r(p̃) > −δn P r(p̂) − r(p̃) > −δn
)
i h
+E r(p̂) − r(p̃) r(p̂) − r(p̃) ≤ −δn P r(p̂) − r(p̃) ≤ −δn n(T − τn )
h
C4 i
≥ r(p̃) − δn − η n(T − τn ),
n
(b)
(A-3)
where C4 is a suitably large positive constant. Note that (a) follows from conditioning and (b)
follows from Lemma 2 and the fact that r(·) is bounded say by dM̄ M . Let us now examine the
second term on the RHS of (A-2). Let C ′ > 0 be a constant to be specified later and δn′ = C ′ δn .
h
+ i
h
+ i
E Yn − nx
= E Yn − nx
Yn − nx ≤ nδn′ e P(Yn − nx ≤ nδn′ e)
h
+ i
+E Yn − nx
Yn − nx 6≤ nδn′ e P(Yn − nx 6≤ nδn′ e)
i
h
≤ nδn′ e + E Yn Yn 6≤ nx + nδn′ e P(Yn − nx 6≤ nδn′ e),
Now, for a Poisson random variable Z with mean µ, it is easy to see that E[Z | Z > a] ≤ a + 1 + µ.
In particular, each component of Yn is a Poisson random variable with rate less than nM T and
23
hence
i
h
E Yn Yn 6≤ nx + nδn′ e ≤ nx + (nδn′ + 1 + nM T )e.
Let us evaluate the probability to run out of some resource by more than nδn′ . Specifically,
P Yn − nx 6≤ nδn′ e
1
≤ P AN (nλ(p̂)(T − τn )) − Anλ(p̂)(T − τn ) 6≤ nδn′ e
3
κn
′
X
δ
b i )n∆n 6≤ 1 nδ′ e
Aλ(p
(A-4)
+P Aλ(p̂)n(T − τn ) 6≤ n(x + e n ) + P
n
3
3
i=1
Consider the first term on the RHS of (A-4). We have nδn′ > n(T −τn )3C(η)(log n)1/2 (n(T −τn ))−1/2
for n large enough and hence, if C ′ ≥ 3T , one can condition on p̂ and apply Lemma 1 (with µ = λ(p̂),
rn = n(T − τn )) to get
1
P AN (λ(p̂)n(T − τn )) − Aλ(p̂)n(T − τn ) 6≤ nδn′ e
3
h i
≤ E P AN (λ(p̂)n(T − τn )) − Aλ(p̂)n(T − τn ) 6≤ n(T − τn )(C1 C ′ /3T )(log n)1/2 (n(T − τn ))−1/2 e | p̂
≤
C3
.
nη
Consider now the second term on the RHS of (A-4)
δ′ P Aλ(p̂)n(T − τn ) 6≤ n(x + n e)
3
δ′ 1
b
b
(x + n e)
= P A[λ(p̂)T − λ(p̂)T
] + Aλ(p̂)T
6≤
1 − τn /T
3
′
δ
δ′ b
b
≤ P A[λ(p̂)T − λ(p̂)T
] 6≤ n e + P Aλ(p̂)T
6≤ x + n e
6
6
′ δn′ δ
n
b
b
e
+
P
A
λ(p̂)T
≤
6
x
+
e .
= P A(λ(p̂)n∆n T − λ(p̂)n∆
T
)
≤
6
n∆
n
n
6
6
(A-5)
Suppose that C ′ ≥ 6. Then by Lemma 2, the second term above is bounded by C5 /nη for a large
enough choice of C5 > 0. The first term on the RHS of (A-5) is upper bounded by C3 /nη by Lemma
1. Consider the third term on the RHS of (A-4).
!
κn
κn
X
X
1
′
b
b i )n∆n 6≤ 1 n δ′ e
Aλ(pi )n∆n 6≤ nδn e
P
P Aλ(p
≤
n
3
3 κn
i=1
i=1
κn
1 δ′
X
n
e − Aλ(pi ) .
P A[N (λ(pi )n∆n ) − λ(pi )n∆n ] 6≤ n∆n
=
3 τn
i=1
Now if δn′ /τn → ∞ (which holds, for example if τn = n−1/(d+3) , κn = nd/(d+3) ), then for n sufficiently
large, we have (1/3)δn′ /τn e − Aλ(pi ) ≥ 1 for all i = 1, ..., κn and Lemma 1 yields
!
κn
X
C3
C3
1
′
b i )∆n 6≤ nδ e
≤ κn η ≤ η−1 .
Aλ(p
P
n
3
n
n
i=1
24
We conclude that with C ′ = max{3T, 6} and for some C6 > 0, P(Yn 6≤ nx + nδn′ e) ≤ C6 /nη−1 , and
in turn
E
i C
h
+ i
h
6
Yn − nx
≤ nδn′ e + E Yn Yn 6≤ nx + nδn′ e η−1 .
n
(A-6)
Combining (A-2), (A-3) and (A-6) we have
h
C6
C4 i
Jnπ ≥
r(p̃) − δn − η n(T − τn ) − M̄ nδn′ − M̄ (nx · e + nδn′ + 1 + nM T ) η−1
n
n
h
C6
C6
C6 i
C4
′
= r(p̃)nT − n (T − τn )δn + (T − τn ) η + M̄ C δn + (M̄ x · e + M T ) η−1 + C ′ δn η−1 + η−2
n
n
n
n
i
h
(a)
≥ r(p̃)nT − nC7 τn + δn + 1/nη−2 ,
where (a) follows from the fact that δn → 0 and by choosing C7 > 0 is suitably large.
Step 3. We now conclude the proof. Note that under the current assumptions, Dλ is convex.
Gallego and van Ryzin (1997, Theorem 1) show that under these conditions the optimal value
of problem (11) say JnD serves as upper bound to Jn∗ . Note that JnD = nr(p̃)T . Define f (n) :=
i
h
C7 τn + δn + 1/nη−2 and note that f (n) ≥ 0 for all n ≥ 0 and that f (n) → 0 as n → ∞. In
addition f (n) does not depend on the specific underlying demand λ. By the remark in the preamble,
JnD ≥ nmD > 0 and hence
Jnπ
Jnπ
f (n)
≥
≥1− D
∗
D
Jn
Jn
m
implying that uniformly over λ ∈ L
lim inf
n→∞
Jnπ
≥ 1.
Jn∗
This, in conjunction with the inequality Jnπ ≤ Jn∗ , completes the proof.
To obtain the rate of convergence stated in (10) in Remark 1 note that the orders of the terms
τn and δn are balanced by choosing τn = n−1/(d+3) and κn = nd/(d+3) . With this choice we have
for C8 = C7 /mD , f (n)/mD = C8 [(log n)1/2 /n1/(d+3) + 1/nη−1 ], implying that
sup lim sup
λ∈L n→∞
1 − Jnπ /Jn∗
< ∞.
(log n)1/2 n−1/(d+3)
Proof of Theorem 2. Fix λ ∈ L and η ≥ 1. Denote by {λ1 , ..., λk } the intensities corresponding
to the prices {p1 , ..., pk }. Define B to be the matrix with ith column equal to Aλi and let (P0 )
denote the following linear optimization problem
max
k
nX
i=1
pi · λi ti : Bt ≤ x,
k
X
i=1
o
ti ≤ T, ti ≥ 0, i = 1, ..., k.
∗
The optimal value of (P0 ), V(P
is known to be an upper bound to J ∗ (cf. Gallego and van Ryzin
0)
(1997, Theorem 1)). For a system with “size” n, the optimal value is just n times the optimal value
25
of the system with size 1, and the optimal solutions are the same. In what follows, for any feasible
vector t, we use V(P0 ) (t) to denote the value of the objective function.
Step 1. We first focus on the the learning and optimization phases. Let τn be such that τn = o(1)
and nτn → ∞ as n → ∞ at a polynomial rate. Divide τn into k intervals of equal length ∆n = τn /k.
Apply each feasible price during ∆n time units. Let
Pi−1 P
λj
N n∆n ij=1 λj − N n∆n j=1
b i) =
λ(p
, i = 1, ..., k.
n∆n
Let (P̂ ) denote the following linear optimization problem
max
k
nX
i=1
b i )ti :
pi · λ(p
k
X
i=1
b i )ti ≤ x − Aeδn ,
Aλ(p
k
X
i=1
o
ti ≤ T − τn , ti ≥ 0, i = 1, ..., k. ,
where δn := C1 (log n)1/2 (n∆n )−1/2 with C1 > 0 to be specified later. For n sufficiently large, the
feasible set of (P̂ ) is nonempty (since x − Aeδn ≥ 0) and compact and hence the latter admits an
optimal solution, say t̂. In what follows, for any feasible vector t, we use V(P̂ ) (t) to denote the value
of the objective function.
Step 2. Here, we derive a lower bound on the expected revenues under the policy π. Consider
applying the solution t̂ to the stochastic system on the interval (τn , T ]. Let M̄ := max{p1 , ..., pk }
P
P
(L)
(L)
(i)
and define Xn := ki=1 nλ(pi )∆n , Xn := ij=1 nλj t̂j , i = 1, . . . , k. Finally put Yn = AN (Xn +
(k)
Xn ). As noted in the preamble of the appendix, one can lower bound Jnπ as follows
#
" k
i
i
h
h X
− M̄ e · E (Yn − nx)+
pi · N Xn(L) + Xn(i) − N Xn(L) + Xn(i−1)
Jnπ ≥ E
= n
i=1
k
X
i=1
i
h
pi · λi E[t̂i ] − M̄ e · E (Yn − nx)+ ,
(A-7)
(L)
(i−1)
(L)
(i)
is
where the equality follows from the fact that that given t̂, N Xn + Xn − N Xn + Xn
distributed as a Poisson random variable with mean λi tˆi .
n
o
b i )kT ≤ δn , min1≤i≤k λ(p
b i ) ≥ m1 /2e . Since revenues are
Let H := ω : max1≤i≤k kλi − λ(p
non-negative, we can lower bound the first sum in (A-7) above as follows
k
X
i=1
pi · λi E[t̂i ] ≥ E
k
hX
i=1
i
pi · λi t̂i H P(H).
Lemma 3 For ω ∈ H, t̂ is feasible for (P0 ) and for C2 , C3 > 0 suitable large, we have
V(P0 ) (t̂) ≥ V(P̂ ) (t̂) − C2 δn ,
(A-8)
∗
V(P̂ ) (t̂) ≥ V(P
− C3 max{δn , τn }.
0)
(A-9)
26
We deduce that
E
k
hX
i=1
i
pi · λi t̂i H
=
(a)
≥
(b)
≥
i
h
E V(P0 ) (t̂) H
i
h
E V(P̂ ) (t̂) − C2 δn H
∗
V(P
− (C2 + C3 ) max{δn , τn },
0)
where (a) follows from (A-8) and (b) follows from (A-9). We now turn to bound the probability of
the event Hc
P(Hc )
(a)
≤
(b)
≤
(c)
≤
b i )kT > δn ) + P( min λ(p
b i ) < m1 /2e)
P( max kλi − λ(p
1≤i≤k
k
X
b i )kT > δn ) +
P(kλi − λ(p
i=1
d
k X
X
P(|λji
i=1 j=1
(d)
≤
1≤i≤k
C4
,
nη
k
X
i=1
bi − λi < m1 /2e − λi )
P(λ
bj (pi )| > δn /T ) +
−λ
d
k X
X
i=1 j=1
bj − λj < −m1 /2)
P(λ
i
i
where C4 > 0 is suitable large, (a), (b), (c) follow from union bounds and (d) follows from a direct
application of Lemma 1 and the appropriate choice of C1 . Hence,
n
k
X
i=1
h
i
C4 ∗
pi · λi E[t̂i ] ≥ n V(P
−
(C
+
C
)
max{δ
,
τ
}
1
−
.
2
3
n
n
0)
nη
(A-10)
We now look into the penalty term, i.e., the second term on the RHS of (A-7). To that end, let
o
n
C ′ > 0 to be a constant to be specified, δn′ = C ′ δn and put E := ω : Yn − nx ≤ nδn′ e and note
that
i
h
E (Yn − nx)+
=
≤
(a)
≤
i
i
h
h
E (Yn − nx)+ E P(E) + E (Yn − nx)+ E c P(E c )
i
h
nδn′ e + E (Yn − nx)+ E c P(E c )
nδn′ e + (nδn′ + 1 + nM T )P(E c )e,
where (a) follows from the definition of E and the fact that for a Poisson random variable Z with
mean µ, E[Z | Z > a] ≤ a + 1 + µ. Now,
k
k
i X
h X
b i )∆n 6≤ nx + nδ′
Anλ(p
A N Xn(L) + Xn(i) − N Xn(L) + Xn(i−1) +
P(E c ) = P
n
i=1
i=1
k
X
i
h 1
A N Xn(L) + Xn(i) − N Xn(L) + Xn(i−1) 6≤ nx + nδn′
≤ P
2
i=1
!
k
X
b i )∆n 6≤ 1 nδ′ .
Anλ(p
+P
2 n
i=1
27
!
!
(A-11)
Using Lemma 1, the second term on the RHS of (A-11) is seen to be bounded by C5 /nη . On the
other hand, the first term on the RHS of (A-11) can be bounded as follows
!
k
i
h X
1 ′
(L)
(i−1)
(L)
(i)
6≤ nx + nδn
A N Xn + Xn − N Xn + Xn
P
2
i=1
k
i 1
i
hh X
A N Xn(L) + Xn(i) − N Xn(L) + Xn(i−1) − nλi t̂i 6≤ nδn′
≤P
4
i=1
!
!
k
k
X
X
1 ′
b
b i )t̂i 6≤ nx
An[λi − λ(pi )]t̂i 6≤ nδn + P
Anλ(p
+P
4
i=1
!
(A-12)
i=1
Note that the feasibility of t̂ for (P̂ ) implies that the last term on the RHS above is equal to zero.
With an appropriate choice of C ′ , Lemma 1 yields that the first two terms on the RHS of (A-12)
are bounded by C6 /nη for C6 > 0 suitably large. We deduce that
i
h
C5 + C6
E (Yn − nx)+ ≤ nδn′ e + (nδn′ + 1 + nM T )
e,
nη
Combining the above with (A-7) and (A-10), we have
Jnπ
≥ n
k
X
i=1
h
i
h
pi · λi E[t̂i ] − M̄ e · E (Yn − nx)+
i
C4 C5 + C6
∗
≥ n V(P
−
(C
+
C
)
max{δ
,
τ
}
1
−
− M̄ [nδn′ e + (nδn′ + 1 + nM )]
2
3
n
n
0)
η
n
nη
∗
η
≥ nV(P0 ) − C9 n(max{δn , τn } + 1/n ).
∗
Step 3. We now conclude the proof. Recalling that mD > 0 bounds below V(P
for all λ ∈ L,
0)
we have
Jnπ
Jn∗
≥
Jnπ
C9 (max{δn , τn } + 1/nη )
≥
1
−
∗
nV(P
mD
0)
implying that uniformly over λ ∈ L
lim inf
n→∞
Jnπ
≥ 1.
Jn∗
This, in conjunction with the inequality Jnπ ≤ Jn∗ , completes the proof.
To obtain the rate of convergence stated in (15) in Remark 2 note that the orders of the terms
δn and τn are balanced by choosing τn ≍ n−1/3 . With this choice we have
sup lim sup
λ∈L n→∞
1 − Jnπ /Jn∗
< ∞.
(log n)1/2 n−1/3
28
B
Proofs of Auxiliary Results
In what follows Ci′ , i ≥ 1 will denote positive constants that depend only on A, x, T and the
parameters of the class L, but not on a specific function λ ∈ L.
Proof of Lemma 1. Let Ji = {j ∈ {1, ..., d} : aij 6= 0}. We proceed with the following inequalities
P A[N (µrn ) − µrn ] 6≤ rn ǫn
(a)
≤
ℓ
X
P
≤
aij [N (µj rn ) − µj rn ] > rn ǫn
j=1
i=1
ℓ
X
d
X
!
X rn ǫn P N (µj rn ) − µj rn >
daij
i=1 j∈Ji
d
X
rn ǫn P N (µj rn ) − µj rn >
≤ℓ
dā
j=1
(b)
≤ℓ
d
X
j=1
o
n
ǫn
exp −θj rn (µj + ) + (exp{θj } − 1)µj rn , (B-1)
dā
where (a) follows from a union bound and (b) follows from the Chernoff bound. The expression in
each of the exponents is minimized for the choice of θj > 0 defined by
ǫn θj = log 1 +
.
dāµj
(B-2)
Plugging back into (B-1) yields
d
n
X
ǫn o
ǫn
ǫn )+
(µj +
exp − log 1 +
≤ ℓ
P A[N (µrn ) − µrn ] 6≤ rn ǫn
dāµj
daij
dā
j=1
n ǫn
ǫn o
ǫn (M + ) +
.
≤ ℓd exp rn − log 1 +
dāM
dā
dā
(B-3)
For the last inequality, note that the derivative of the term in the exponent with respect to µj
is given by − log(1 + ǫn /µj ) + ǫn /µj , which is always positive for ǫn > 0. Now, using a Taylor
ǫn
expansion we get that for some ξ ∈ [0, dāM
],
h ǫn
1 1
ǫ2n
ǫn i
ǫn (1 +
= −
)−
−M log 1 +
dāM
dāM
dāM
2 1 + ξ d2 ā2 M
ǫ2
≤ − 2 n2 ,
4d ā M
where the last inequality holds only if ǫn /(dāM ) ≤ 1 (which is valid for sufficiently large n).
Substituting for ǫn , we get
n (C(η))2 log n o
≤ ℓd exp −
P A[N (µrn ) − µrn ] 6≤ rn ǫn
4d2 ā2 M
ℓd
=
,
nη
29
Hence the first result follows. The other inequality goes through in a similar fashion. This completes
the proof.
Proof of Lemma 2. The optimal vector p̃ for the deterministic problem is contained one of the
hyper-rectangles comprising the price grid. Let pj be the closest vector to p̃ in the price grid. Note
that the index j depends on n but we do not make the n-dependence explicit to avoid cluttering
1/d
the notation. We first show that pj ∈ Pfn with high probability. Note that kpj − p̃k ≤ C1′ /κn
some
C1′
> 0 and hence kλ(pj ) − λ(p̃)k ≤
P pj ∈
/ Pfn
=
≤
(a)
≤
1/d
KC1′ /κn .
we deduce that
P AN (λ(pj )n∆n T ) > n∆n (x + δn e)
P AN (λ(p̃) + C1′ Kκ−1/d
)n∆
T
>
n∆
(x
+
δ
e)
n
n
n
n
′
−1/d
)n∆
T
>
n∆
w
P AN (λ(p̃) + C1′ Kκ−1/d
)n∆
T
−
A(λ(p̃)
+
C
Kκ
n
n n ,
n
1
n
n
−1/d
where wn = δn e − C1′ KT κn
Ae. Note that (a) is a consequence of the feasibility of p̃ for the
1/d
deterministic problem (in particular, Aλ(p̃)n∆n T ≤ n∆n x). Now since δn κn
wn = δn (e −
for
1/d
C1′ KT /(δn κn )Ae)
→ ∞, we have that
≥ δn /2 for n sufficiently large. By using Lemma 1 (where rn and
ǫn are here n∆n and δn /2 respectively), we deduce that the above probability is bounded above by
C2′ /nη for a sufficiently large C2′ > 0. We then have
P r(p̃) − r(p̂) > δn
/ Pfn + P pj ∈ Pfn , r̂(pj ) = 0 . (B-4)
≤ P r(p̃) − r(p̂) > δn ; pj ∈ Pfn , r̂(pj ) > 0 + P pj ∈
Now under the condition that pj ∈ Pfn , we have
r(p̃) − r(p̂) = r(p̃) − r(pj ) + r(pj ) − r̂(pj ) + r̂(pj ) − r̂(p̂) + r̂(p̂) − r(p̂)
≤ r(p̃) − r(pj ) + r(pj ) − r̂(pj ) + r̂(p̂) − r(p̂),
where the inequality follows from the definition of p̂ given in (A-1). For the first term on the RHS
above, note that for C3′ > 0 suitably large
|r(pj ) − r(p̃)|
≤
|pj · λ(pj ) − pj · λ(p̃)| + |pj · λ(p̃) − p̃λ(p̃)|
(a)
≤
(b)
≤
(c)
≤
kpj kkλ(pj ) − λ(p̃)k + kλ(p̃)kkpj − p̃k
kpj kK
C3′
1/d
κn
C1′
1/d
κn
+ kλ(p̃)k
C1′
1/d
κn
,
where (a) follows from Cauchy-Schwarz inequality, (b) follows from the Lipschitz condition on λ and
(c) follows from the fact that kpk ≤ M̄ for all p ∈ Dp . Now, recalling comment 2 in the preamble
30
of Appendix A, we have r(p̃)T ≥ mD > 0 and hence for n sufficiently large r(pj ) > mD /(2T ). By
Lemma 1, we deduce that
′
b j ) = 0 ≤ C4
P pj ∈ Pfn , r̂(pj ) = 0 ≤ P pj · λ(p
nη
1/d
Coming back to (B-4), since C3′ /κn < (1/4)δn for n sufficiently large,
P r(p̃) − r(p̂) > δn
1
1
C3′
≤ P r(pj ) − r̂(pj ) > δn − 1/d
; pj ∈ Pfn + P r̂(p̂) − r(p̂) > δn ; pj ∈ Pfn , r̂(pj ) > 0
2
2
κn
+P pj ∈
/ Pfn + P pj ∈ Pfn , r̂(pj ) = 0
1 C2′
C′
1 b
≤ P r(pj ) − r̂(pj ) > δn + P p̂λ(p̂) − r(p̂) > δn + η + η4
4
2
n
n
By Lemma 1 the two first terms on the RHS above are bounded by C5′ /nη for some C5′ > 0 and
the proof is complete.
Proof of Lemma 3. For ω ∈ H we have
k
X
Aλi t̂i
=
k
X
i=1
i=1
(a)
≤
k
X
i=1
b i ))t̂i
A(λi − λ(p
b i )kT Ae
x − δn Ae + max kλi − λ(p
1≤i≤k
(b)
≤
b i )t̂i +
Aλ(p
x,
where (a) follows from the feasibility of t̂ for (P̂ ) and the fact that A has non-negative entries and
(b) follows from the fact that ω ∈ H. We deduce that for ω ∈ H, t̂ is feasible for (P0 ). In addition
the cost of t̂ in (P0 ) can be lower bounded as follows (where C1′ > 0 is suitable large)
V(P0 ) (t̂) =
=
k
X
pi · λi t̂i
i=1
b i )t̂i +
pi · λ(p
i=1
k
X
k
X
i=1
b i ))t̂i
pi · (λi − λ(p
b i )kT
≥ V(P̂ ) (t̂) − dM̄ k max kλi − λ(p
≥ V(P̂ ) (t̂) −
1≤i≤k
′
C1 δn .
On the other hand, consider an optimal solution t∗ to (P0 ). It is easy to see that such a solution
P
needs to satisfy ki=1 t∗i ≥ T ′ where T ′ = min1≤j≤m xj /(āM ). Indeed, if it were not the case,
then one could strictly improve the objective function by lengthening the time where one applies
a price yielding positive revenues. In turn, this implies that for at least one i, t∗i ≥ T ′ /k. Let
ηn = max{τn , C2′ δn } with C2′ > 0 suitably large and define t̃ = (t∗ − ηn e)+ . Note that t̃i ≤ t∗i for
31
i = 1, ..., k and t̃i′ = t∗i′ − ηn for n sufficiently large. Hence
addition, we have for ω ∈ H
k
X
i=1
b i )t̃i
Aλ(p
=
i:
=
i:
(a)
≤
X
t∗i >ηn
X
b i )t∗ −
Aλ(p
i
Aλi t∗i
t∗i >ηn
x+
i:
X
i: t∗i >ηn
(b)
≤
+
i:
X
t∗i >ηn
X
t∗i >ηn
Pk
i=1 t̃i
≤
Pk
bi ηn
Aλ
b i ) − λi )t∗ −
A(λ(p
i
i:
∗
i=1 ti
X
t∗i >ηn
− τn ≤ T − τn . In
b i )ηn
Aλ(p
b i )kT − C ′ λ(p
b i )δn
A e max kλi − λ(p
2
1≤i≤k
x − Aeδn .
where (a) follows from the feasibility of t∗ for (P0 ) and the non-negativity of the elements of A and
(b) follows from the conditions defining H, the fact that C2′ is chosen sufficiently large and the fact
that for at least one i, t∗i > ηn . We see that t̃ is feasible for P̂ (for ω ∈ H ). Let C3′ = dM̄ M k. The
cost of t̃ in (P̂ ) can be lower bounded as follows (where C4′ > 0 is suitably large)
V(P̂ ) (t̃) =
≥
=
k
X
i=1
k
X
i=1
k
X
b i )t̃i
pi · λ(p
b i )t∗ − C ′ ηn
pi · λ(p
i
3
pi · λi t∗i +
k
X
i=1
i=1
b i ) − λi )t∗ − C ′ ηn
pi · (λ(p
i
3
b i )kT − C ′ ηn
≥ V(P0 ) (t∗ ) − dM̄ k max kλi − λ(p
3
∗
≥ V(P0 ) (t ) −
1≤i≤k
′
C4 max{δn , ηn }.
References
Afèche, P. and Ata, B. (2005), ‘Revenue management in queueing systems with unknown demand
characteristics’, working paper, Northwestern University .
Araman, V. F. and Caldentey, R. A. (2005), ‘Dynamic pricing for non-perishable products with
demand learning’, working paper, New York University .
Auer, P., Cesa-Bianchi, N., Freund, Y. and Schapire, R. E. (2002), ‘The nonstochastic multiarmed
bandit problem’, SIAM Journal of Computing 32, 48–77.
Aviv, Y. and Pazgal, A. (2005), ‘Pricing of short life-cycle products through active learning’,
working paper, Washington University .
32
Ball, M. and Queyranne, M. (2006), ‘Toward robust revenue management: Competitive analysis of
online booking’, working paper, University of Maryland .
Bertsimas, D. and Perakis, G. (2003), ‘Dynamic pricing; a learning approach’, working paper,
Massachusetts Institute of Technology .
Bitran, G. and Caldentey, R. (2003), ‘An overview of pricing models for revenue management’,
Manufacturing & Service Operations Management 5, 203–229.
Carvalho, A. X. and Puterman, M. L. (2005), ‘Dynamic pricing and learning over short time
horizons’, working paper, University of British Columbia .
Cesa-Bianchi, N. and Lugosi, G. (2006), Prediction, learning, and games, Cambridge University
Press.
Elmaghraby, W. and Keskinocak, P. (2003), ‘Dynamic pricing in the presence of inventory considerations: Research overview, current practices and future directions’, Management Science
49, 1287–1309.
Eren, S., Maglaras, C. and van Ryzin, G. (2006), ‘Pricing and product positioning without market
information’, working paper, Columbia University .
Fisher, M. and Rajaram, K. (2000), ‘Accurate retail testing of fashion merchandise: Methodology
and application’, Marketing Science 19, 226–278.
Foster, D. P. and Vohra, R. (1999), ‘Regret in the on-line decision problem’, Games and Economic
Behavior 29, 7–35.
Gallego, G. and van Ryzin, G. (1994), ‘Optimal dynamic pricing of inventories with stochastic
demand over finite horizons’, Management Science 50, 999–1020.
Gallego, G. and van Ryzin, G. (1997), ‘A multiproduct dynamic pricing problem and its applications
to network yield management’, Operations Research 45, 24–41.
Gaur, V. and Fisher, M. L. (2005), ‘In-store experiments to determine the impact of price on sales’,
Production and Operations Management 14, 377–387.
Hannan, J. (1957), ‘Approximation to bayes risk in repeated play’, Contributions to the Theory of
Games, Princeton University Press III, 97–139.
Iyengar, G. and Zeevi, A. (2005), ‘Effects of parameter uncertainty on the design and control of
stochastic systems’, working paper, Columbia University .
33
Keller, G. and Rady, S. (1999), ‘Optimal experimentation in a changing environment’, The review
of economic studies 66, 475–507.
Kleinberg, R. and Leighton, F. T. (2003), ‘The value of knowing a demand curve: Bounds on regret
for online posted-price auctions’, Proc. of the 44th Annual IEEE Symposium on Foundations
of Computer Science .
Lai, T. L. and Robbins, H. (1985), ‘Asymptotically efficient adaptive allocation rules’, Advances in
Applied Mathematics 6, 4–22.
Larson, C. E., Olson, L. J. and Sharma, S. (2001), ‘Optimal inventory policies when the demand
distribution is not known’, Journal of Economic Theory 101, 281–300.
Lim, A. and Shanthikumar, J. (2005), ‘Relative entropy, exponential utility, and robust dynamic
pricing’, working paper, University of California Berkeley .
Lobo, M. S. and Boyd, S. (2003), ‘Pricing and learning with uncertain demand’, working paper,
Duke University .
Maglaras, C. and Zeevi, A. (2005), ‘Pricing and design of differentiated services: Approximate
analysis and structural insights’, Operations Research 53, 242–262.
Paschalidis, I. C. and Tsitsiklis, J. N. (2000), ‘Congestion-dependent pricing of network services’,
IEEE/ACM Transactions on Networking 8, 171–184.
Robbins, H. (1952), ‘Some aspects of the sequential design of experiments’, Bull. Amer. Math. Soc.
58, 527–535.
Rusmevichientong, P., Van Roy, B. and Glynn, P. W. (2006), ‘A non-parametric approach to
multi-product pricing’, Operations Research 54, 82–98.
Scarf, H. (1959), ‘Bayes solutions of the statistical inventory problem’, Annals of Mathematical
Statistics 30, 490–508.
Talluri, K. T. and van Ryzin, G. J. (2005), Theory and Practice of Revenue Management, SpringerVerlag.
van Ryzin, G. and McGill, J. (2000), ‘Revenue management without forecasting or optimization:
An adaptive algorithm for determining airline seat protection levels’, Management Science
46, 760–775.
Williams, L. and Partani, A. (2006), ‘Learning through price testing’, 6th Informs Revenue Management Conference, June 2006, http://www.demingcenter.com .
34
Download