Blind Nonparametric Revenue Management: Asymptotic Optimality of a Joint Learning and Pricing Method Omar Besbes† Assaf Zeevi‡ Columbia University Columbia University First submitted: March 29, 2006 Current version: August 29, 2006 Abstract We consider a general class of network revenue management problems in which multiple products are linked by various resource constraints. Demand is modeled as a multivariate Poisson process whose instantaneous rate at each point in time is determined by a vector of prices set by the decision maker. The objective is to price the products so as to maximize expected revenues over a finite sales horizon. The decision maker observes realized demand over time, but is otherwise “blind” to the underlying demand function which maps prices into the instantaneous demand rate. Few structural assumptions are made with regard to the demand function, in particular, it need not admit any parametric representation. We introduce a general method for solving such blind revenue management problems: first a learning phase experiments with a “small” number of prices over an initial “short” time interval; then a simple optimization problem is solved using an estimate of the demand function obtained from the previous stage, and a near-optimal price is fixed for the remainder of the time horizon. To evaluate the performance of the proposed method we compare the revenues it generates to those corresponding to the optimal dynamic pricing policy that knows the demand function a priori. In a regime where the sales volume grows large, we prove that the gap in performance is suitably small; in that sense, the proposed method is asymptotically optimal. Keywords: Revenue management, pricing, nonparametric estimation, learning, asymptotic optimality. ∗ Research partially supported by NSF grant DMI-0447562 Graduate School of Business, e-mail: ob2105@columbia.edu ‡ Graduate School of Business, e-mail: assaf@gsb.columbia.edu † 1 ∗ 1 Introduction Revenue management is a growing subfield of operations research which deals with modeling and optimizing complex pricing and demand management decisions. One of the central problems in this area is the so-called dynamic (or tactical) pricing problem, a typical instance of which is the following: given an initial level of inventory and a finite selling horizon, dynamically price the products being sold so as to maximize the total expected revenues. In this problem it is implicitly assumed that there is little or no control over inventory throughout the time period over which sales are allowed, and pricing is the main lever used to optimize profits. There are numerous examples of revenue management problems that fall into this category, arising in industries ranging from fashion and retail to air travel, hospitality and leisure. (For a general overview see Talluri and van Ryzin (2005) and the survey papers by Elmaghraby and Keskinocak (2003) and Bitran and Caldentey (2003).) Most traditional work in the revenue management area assumes that the functional relationship between the expected demand rate and price, often referred to as the demand function or demand curve, is known to the decision maker, and the only form of uncertainty is due to randomness of demand realizations; in what follows we shall refer to this as the “full information” setting. This assumption is violated in several scenarios that one encounters in practice, where a firm might only possess limited information with regard to the demand function. This issue is especially relevant for fashion items and technical products with short life cycles, but it also arises in many other instances in which limited historical sales data prohibit accurate inference of the price-demand relationship. In such cases one can view the assumption of “full information” as more of a convenient mathematical abstraction (which facilitates studying the structural properties of optimal pricing policies), rather than an accurate description of the information available to the decision maker. Relatively few studies in the revenue management literature consider uncertainty with regard to the demand function, and most of this line of work is pursued within the context of a parametric structure in which one or more parameters are not known; more will be said on this in the literature review which is deferred to Section 2. The main purpose of this paper is to generalize the traditional full information formulation to account for significant uncertainty in the demand function, and in this context to propose tractable solutions to the dynamic pricing problem. In doing so, we also derive qualitative insights that pertain to the practice of “price testing” which is in widespread use in a variety of revenue management applications. We consider a general class of network revenue management problems in which multiple products are linked by various resource constraints. Demand is modeled as a multivariate Poisson process whose instantaneous rate at each point in time is determined by a vector of prices set by the decision maker. (The Poisson process model is adopted primarily for concreteness and simplicity of 2 exposition; see the discussion in Section 6.) The objective is to price the products so as to maximize expected revenues over a finite selling horizon. Unlike the traditional setting of the tactical pricing problem, we assume that the decision maker does not know the true underlying demand function, and is only able to observe realized demand over time. Few assumptions are made with regard to the structure of the underlying demand function, in particular, it need not admit any parametric representation. The decision maker is therefore faced with nonparametric uncertainty, and in that sense s/he is effectively “blind” to the true model. Consider the optimal revenues in the full information version of the tactical pricing problem described above (note that the latter can be achieved using a dynamic pricing policy which is derived from the underlying demand function using dynamic programming). In the blind setting we consider, where meaningful information characterizing the demand function is absent, the following fundamental question arises: Do there exist blind pricing strategies that compensate for incomplete information by learning the demand behavior “on the fly,” while guaranteeing performance that is close to the maximum achievable in the full information setting? Of course, even the optimal policy that is constructed in the absence of upfront knowledge of the demand function is expected to generate significantly less revenues than its full information counterpart. Summary of the main results. In this paper we show that the full information benchmark can be approached, which in a sense provides a positive answer to the question posed above. This is done by developing a method in which estimation (demand learning) and optimization (pricing) are executed over consecutive non-overlapping time intervals. A rough algorithmic description of the method is as follows: i.) a “short” initial segment of the selling horizon is used to learn the demand function by experimenting with a “small” number of prices; ii.) a deterministic relaxation of the revenue maximization problem is formulated based on the estimated demand function; and iii.) a fixed price solution of the optimization problem is used for the remaining selling season (see Algorithms 1 and 2). It is important to note that the approach taken in the learning phase is nonparametric as the true demand function need not possess any parametric structure. With regard to the performance of these blind pricing polices, we prove that they are nearly optimal in the following sense: the ratio of revenues generated by the proposed policies to those achieved by the optimal dynamic pricing policy in the full information setting approaches one as the sales volume grows large (see Theorems 1 and 2). We also derive bounds on the magnitude of the optimality gap which hold uniformly over the class of admissible demand functions, and illustrate the characteristics and efficacy of the method via several numerical examples. In moderate sized problems, where inventories are of the order of hundreds or few thousands of units, our proposed method typically achieves in excess of 70% (and may achieve as much as 90%) of the optimal full 3 information revenues Main insights and bearing on revenue management practices. One of the main takeaway messages of our study is that the value of knowing the functional relationship between price and demand is, perhaps somewhat surprisingly, not as as high as one may have anticipated. In particular, by judiciously combining real-time demand learning and price optimization, the decision maker is able to achieve close to the optimal full information revenues (despite the fact that s/he is given little prior information with regard to the demand function). A useful insight that arises from our analysis is related to the industry practice of “price testing,” a prevalent method used by firms to address the lack of precise demand information; a recent empirical study of 32 large U.S. retailers, finds that nearly 90% of them conduct price experiments (see Gaur and Fisher (2005)). The main idea is quite straightforward and closely related in spirit to the structure of our algorithm: in the first step one experiments with several prices; and in the second step one selects the price/s that are expected to optimize revenues based on the demand observed in the previous step. Among the main questions that arise in this context is how many prices to test, and for how long. Current practices are mostly guided by ad hoc considerations in addressing such issues. Given the significant role that price testing plays in revenue management practices, there is a growing need to improve the understanding of this approach and add to its rigorous foundations (see, e.g., Williams and Partani (2006) for further discussion and examples). Our analysis contributes to this goal by providing simple and intuitive guidelines for selecting both the number of prices that should be tested, as well as the overall fraction of the selling season that should be dedicated to experimentation. The main takeaway here is that carefully constructed price testing schedules extract sufficient information on the demand model, and hence enable the firm to optimize profits over the entire selling season. The reader may wonder whether implementing complex price testing schedules is at all feasible in practice. The positive answer to this owes, to a large extent, to an important revolution in retail and manufacturing industries which hinges on the Internet and the Direct-to-Customer model. This business model, that has become synonymous with such industry giants as Dell Computers and Amazon, facilitates the implementation of price changes and removes traditional sources of friction (e.g., catalog printing). Recent advances in information technology and the Internet have also made it simple to test prices across different customer accounts, where each individual account is only able to observe the prices offered to it and not to other accounts. This greatly facilitates learning about customer preferences and sensitivity to price, which is one of the main building blocks in assembling a demand curve structure. (It is worth noting that the use of “testing” ideas is not limited to prices; see Fisher and Rajaram (2000) for a study involving merchandise testing.) Finally, an important insight that emerges from our study pertains to the risk associated with modeling assumptions. In particular, consider the following dilemma that arises frequently in rev- 4 enue management applications: should one assume a specific parametric structure for the demand function versus a more elaborate model. The former is simple to calibrate but runs a significant risk of being misspecified relative to the true underlying demand function. The latter is potentially more difficult to estimate, yet may reduce the risk of model misspecification. The current paper demonstrates the feasibility of a learning and pricing approach that provides performance guarantees which hold regardless of the nature of the underlying demand function. This implies that the risk of significant revenue loss due to misspecification can be effectively mitigated if one adopts such nonparametric approaches. The remainder of the paper. The next section provides a review of the related literature and positions the contributions of the present paper. Section 3 introduces the model and formulates the problem. Section 4 presents the nonparametric pricing algorithm and gives the main results concerning its asymptotic performance. Section 5 provides illustrative numerical examples, and Section 6 contains some concluding remarks and discussion of the modeling assumptions. All proofs are collected in two appendices: Appendix A contains the proofs of the main results and Appendix B contains proofs of auxiliary lemmas. 2 Related Literature and Positioning of the Paper Blind revenue management falls into a broader category of problems which involve dynamic optimization under incomplete information. Such problems have been the topic of investigation in Operations Management, Computer Science, Game Theory and Economics: we will next briefly survey the most relevant studies in these streams of literature. Bayesian approaches and parametric uncertainty. The most common framework that is used to study dynamic optimization under incomplete information is a Bayesian modification of the dynamic programming principle. In particular, when uncertainty is couched in parametric terms, the approach works roughly as follows. First, a prior distribution is assumed for the unknown parameters, and the state-space of the dynamic program is augmented to include information accumulated over time. Then the prior is updated using the observations and the modified Bellman equation is solved. The approach dates back to Scarf (1959) who formalized it in the context of a periodic review inventory problem. Since most revenue management studies assume that the demand function has a known parametric structure (for purposes of deriving qualitative insights), the Bayesian approach articulated above has essentially been the method of choice to incorporate demand function uncertainty. Examples of such work include Aviv and Pazgal (2005) and Araman and Caldentey (2005), both of which restrict attention to a single product problem with one unknown parameter characterizing the demand function; see also Afèche and Ata (2005) for an application in queueing systems and 5 Keller and Rady (1999) for an example in the Economics literature. A common problem with the Bayesian-based analysis involves the introduction of a new state “variable” that summarizes demand information accumulated over time. This often necessitates an approximation of the value function to avoid the curse of dimensionality. For examples of such an analysis in the context of a single product pricing problem where the demand function is linear, see Lobo and Boyd (2003) and Carvalho and Puterman (2005); performance of the policies proposed in both studies are only evaluated via simulation. (For a non-Bayesian version of this problem that uses least-squares to update parameter values see Bertsimas and Perakis (2003).) The Bayesian formulation suffers from additional and more fundamental shortcomings. First, and foremost, it essentially restricts the demand function to have a parametric structure. One could argue that in most real-world problems this is a dubious hypothesis. Second, it requires as input a prior distribution which is typically chosen to support closed form calculation of the posterior (by restricting attention to conjugate families of distributions). This issue is perhaps less crucial in a problem that just involves statistical estimation where eventually the effect of the prior typically “dissipates.” In contrast, the objective function in the dynamic optimization problem is obtained by taking an expectation relative to the prior distribution, and hence any notion of optimality associated with a Bayesian-based policy hinges on the chosen prior. Consequently, the Bayesian approach is significantly prone to model misspecification, a consequence of both the parametric assumptions and the arbitrary choice of prior. Nonparametric approaches. There are relatively few studies in the literature that focus on nonparametric approaches. Larson et al. (2001) study a classical stochastic inventory problem where the demand distribution is not known and show that a history dependent (s, S) policy is optimal. In the revenue management literature, the work of van Ryzin and McGill (2000) investigates datadriven adaptive algorithms for determining airline seat protection levels. The absence of parametric assumptions in this case is with respect to the distributions of customers’ requests for each class. Another recent application in revenue management is discussed in Rusmevichientong et al. (2006) who formulate a nonparametric approach to a multiproduct single resource pricing problem. Their work focuses on a static optimization problem in which a car manufacturer seeks fixed prices that maximize revenues based on historical preference data and does not involve tradeoffs between learning and pricing. While the studies above are data-driven, some recent studies have also focused on settings where the decision maker lacks any ability to learn. Lim and Shanthikumar (2005) formulate a robust counterpart of the single product revenue management problem of Gallego and van Ryzin (1994) where the uncertainty arises at the level of the point process distribution characterizing the customers’ requests. The authors use a max-min formulation where nature is adversarial at every point in time. This leads to a conservative situation which in addition is not prescriptive. (Other 6 approaches to modeling uncertainty include competitive ratio analysis and minimax regret; see for example Eren et al. (2006) and Ball and Queyranne (2006) for references in the revenue management literature.) Related learning paradigms. The general problem of dynamic optimization with limited or imperfect information has attracted significant attention in several fields; an exhaustive survey of which is well beyond the scope of this paper. The results discussed next are, at least in spirit, the most related to our work. In economics, a line of work that traces back to Hannan (1957) studies settings where the decision maker faces an oblivious opponent, and the benchmark used to evaluate a given policy is the rewards accumulated by the best possible single action, had the decision maker known in advance the actions of the adversary. (This is somewhat related to the full information benchmark that we employ.) See Foster and Vohra (1999) for an excellent review of work that descends from the Hannan paper, and makes illuminating and subtle connections of this work to other fields. Related studies in the computer science literature include Auer et al. (2002) who propose an efficient algorithm for an adversarial version of the multi-armed bandit problem, and Kleinberg and Leighton (2003) who discuss a revenue management application in on-line posted-price auction. (See also Robbins (1952) and Lai and Robbins (1985).) The problem we study, when viewed through the lens of the previous papers, can be roughly described as a continuous time multi-armed bandit problem with a (multi-dimensional) continuum of possible actions and limited resources. This makes the problem fundamentally different from the traditional multi-armed bandit literature and most adversarial prediction and learning problems (cf. the text by Cesa-Bianchi and Lugosi (2006) for a recent and comprehensive survey). Summary. The main purpose of our work is to propose an intuitive and tractable approach to a broad class of dynamic optimization problems under nonparametric model uncertainty, and to provide performance guarantees for the proposed methods. To the best of our knowledge the network revenue management problem studied here has not been addressed to date under any type of demand function uncertainty. In a sense our work is most closely related to that of Gallego and van Ryzin (1997) which develops provably good fixed price heuristics for a broad class of network revenue management problems. Key to their study is the assumption that the demand function is known to the decision maker. Our problem formulation is similar to that of Gallego and van Ryzin (1997) but relaxes the full information assumption made there. An important element in our algorithms is the approximation of the original revenue management problem with a suitable deterministic counterpart; this follows closely the ideas developed in Gallego and van Ryzin (1997). The separation of estimation and optimization so that each is performed on a different time scale is similar to ideas appearing in Iyengar and Zeevi (2005) which studies implications of parameter uncertainty on performance analysis and control of stochastic systems. 7 3 Problem Formulation The model and related assumptions. We consider a revenue management problem in which a firm sells d different products which are generated (assembled or produced) from ℓ resources. Let A = [aij ] denote the capacity consumption matrix, whose entries aij ≥ 0, i = 1, . . . , ℓ and j = 1, . . . , d, denote the number of units of resource i required to generate product j. It is assumed that the entries of A are integer valued and that it has no zero column. The selling horizon is denoted by T > 0, and after this time sales are discontinued and there is no salvage value for the remaining unsold products. Demand for products at any time t ∈ [0, T ] is given by a multivariate Poisson process with intensity λt = (λ1t , ..., λdt ) which measures the instantaneous demand rate (in units such as number of products requested per hour, say). This intensity is determined by the price vector at time t, p(t) = (p1 (t), . . . , pd (t)) through a demand function λ : Dp → Rd+ , where Dp ⊆ Rd+ denotes the set of feasible prices. Thus the instantaneous demand rate at time t is λt = λ(p(t)), and the realized demand is a controlled Poisson process. We will assume, unless explicitly specified otherwise, that the feasible price set Dp is a compact convex set. Regarding the demand function λ(·) we assume that it has an inverse denoted γ(·) and that the revenue function r(λ) := λ · γ(λ) is jointly concave; here for two vectors y, z ∈ Rd , y · z denotes the usual scalar product and kyk := max{|y i | : i = 1, ..., d}. In addition, we assume that the set Dλ := {l : l = λ(p), p ∈ Dp } is convex; these assumptions are quite standard in the revenue management literature (cf. Talluri and van Ryzin (2005)). We assume that the demand function belongs to the class of functions L := L(K, M, m, p∞ ) which satisfy the above and in addition for finite positive constants K, M , m and a vector p∞ ∈ Dp satisfy: i.) Boundedness of demand: kλ(p)k ≤ M for all p ∈ Dp . ii.) Lipschitz continuity: kλ(p) − λ(p′ )k ≤ Kkp − p′ k for all p, p′ ∈ Dp . iii.) Minimum revenue rate: max{p · λ(p) : p ∈ Dp } ≥ m. iv.) “Shut-off” price: λ(p∞ ) = 0. Assumptions i.)-iii.) are quite benign and hold for many demand function models used in the revenue management literature such as linear, exponential and iso-elastic (Pareto), as long as the parameters lie in a compact set; see, e.g., Talluri and van Ryzin (2005, §7) for further examples. The existence of a “shut-off” price in Assumption iv.) is not restrictive from a practical standpoint since in most applications there exists a finite price that yields zero demand. The assumption that p∞ ∈ Dp is also not overly restrictive. To see that, consider the case where the demand function is separable and λj (p) = f j (pj ) for j = 1, ..., d, where the functions f j (·) are non-increasing and 8 f j (pj∞ ) = 0. Putting γ j (l) = inf{p ≥ 0 : λj (p) ≤ l}), if we assume that yγ j (y) are concave for j = 1, . . . , d, then iv.) is clearly satisfied. Information structure and the optimization problem. Let (p(t) : 0 ≤ t ≤ T ) denote the price process which is assumed to have sample paths that are right continuous with left limits taking values in Dp . Let (N 1 (·), ..., N d (·)) be a vector of mutually independent unit rate Poisson processes. Rt The cumulative demand for product j up until time t is then given by D j (t) := N j ( 0 λj (p(s))ds). We say that (p(t) : 0 ≤ t ≤ T ) is non anticipating if the value of p(t) at time t ∈ [0, T ] is only allowed to depend on past prices {p(s) : s ∈ [0, t)} and demand values {(D 1 (s), . . . , Dd (s)) : s ∈ [0, t)}. (More formally, the price process is adapted to the filtration generated by the past values of the demand process and price process.) We assume that the decision maker does not know the true demand function, and s/he only knows that λ ∈ L. The decision maker is able to continuously observe realized demand at all time instants starting at time 0 and up until the end of the selling horizon T . We shall use π to denote a joint learning and pricing policy that maps the above information structure to a non anticipating price process (p(t) : 0 ≤ t ≤ T ). With some abuse of terminology, we will use the term policy to refer to the price process and algorithm that generates it interchangeably. For 0 ≤ t ≤ T put Z t j,π j λj (p(s))ds , for j = 1, . . . , d, (1) N (t) := N 0 where N j,π (t) denotes the cumulative demand for product j up to time t under the policy π. Let N π (t) denote the vector (N 1,π (t), ..., N d,π (t)). Let x = (x1 , x2 , ..., xℓ ) denote the inventory level of each resource at the start of the selling season. We assume without loss of generality that xi > 0, i = 1, ..., ℓ. A joint learning and pricing policy π is said to be admissible if the induced price process satisfies Z T AdN π (s) ≤ x a.s., (2) 0 p(s) ∈ Dp , 0 ≤ s ≤ T, (3) where A is the capacity consumption matrix defined earlier and vector inequalities are assumed to hold componentwise. It is important to note that while the decision maker does not know the demand function, knowledge of p∞ guarantees that the constraint (2) can be met. Let P denote the set of admissible learning and pricing policies. The dynamic optimization problem faced by the decision maker under the information structure described above would be: choose π ∈ P to maximize the total expected revenues hZ T i π J (x, T ; λ) := E p(s) · dN π (s) . (4) 0 There is of course a glaring defect in the above objective: the decision maker is not able to compute the expectation in (4), and hence to evaluate the performance of any proposed policy, since the true 9 demand function governing customer requests is not known a priori. This lends further meaning to the terminology “blind revenue management,” as the decision maker is attempting to optimize (4) in a blind manner. We will revisit the objective of the decision maker shortly. The full information benchmark. When the decision maker knows the demand function λ prior to the start of the selling season, the dynamic optimization problem described in (4) can, at least in theory, be computed; this will be referred to as the full information case. This problem is precisely the one formulated in Gallego and van Ryzin (1997) who also characterize the optimal state-dependent pricing policy using dynamic programming. Suppose that we fix an admissible demand function λ ∈ L. Let Pλ denote the class of policies that “know” the demand function prior to the start of the selling season, and whose corresponding price process is non anticipating and satisfies the admissibility conditions (2)-(3) (in other words, when compared to policies in P, policies in Pλ are allowed to depend on the true value of the demand function λ). Let us define hZ T i ∗ J (x, T |λ) := sup E p(s) · dN π (s) , (5) π∈Pλ 0 where the presence of λ on the left hand side in (5) reflects the fact that the optimization problem is solved “conditioned” on knowing the demand function a priori. The regularity conditions imposed earlier in this section guarantee that there exists a Markovian policy in Pλ that achieves the supremum and hence is optimal (see Gallego and van Ryzin (1997) for details). The term Markovian reflects the fact that the policy will only depend on past demand observations through the current state of inventory. Performance measures and the main objective. Clearly the value of the full information optimization problem (5) will serve as an upper bound on the value of the original optimization problem described in (4). That is, for any fixed demand function λ ∈ L we have that J π (x, T ; λ) ≤ 1 for all admissible policies π ∈ P. J ∗ (x, T |λ) This ratio measures the performance of any admissible policy on relative scale, expressed as a fraction of the optimal revenues that are achieved with the aid of an oracle that knows the true demand function. One would anticipate the percent loss relative to the full information case to be quite large and the above ratio to be bounded strictly away from 1. As mentioned in the introduction, the main question that this paper addresses is the following: Do there exist policies π ∈ P which do not know the demand function yet by learning it “on the fly” can achieve near optimal revenues in the sense that J π (x, T ; λ) ≈ J ∗ (x, T |λ)? Before attempting to answer this question it is necessary to rule out trivial solutions. In particular, one possible strategy which is not precluded by policies in P is to “guess” the demand function 10 at time t = 0, and then proceed to optimize the objective using this guess. If the guess is correct, this policy can generate the full information optimal revenues J ∗ (x, T |λ) by following the optimal Markovian pricing scheme. Of course this type of guessing policy will almost never be adopted in practice. One way to exclude it mathematically is to require that the relative performance of a joint learning-pricing policy π ∈ P be measured with respect to the worst possible demand function in the class L, namely J π (x, T ; λ) . λ∈L J ∗ (x, T |λ) inf (6) The criterion in (6) can be viewed as the result of a two step procedure: first the decision maker selects a policy π ∈ P, and then “nature” picks the worst possible demand function λ ∈ L for this particular policy. Measuring performance in this manner guarantees that a “good” policy will perform well regardless of the true underlying demand function. In addition, the decision maker’s objective is now well posed, as s/he can, at least in theory, find the “worst” demand function λ ∈ L corresponding to “each” policy. The fact that such policies can only learn the true demand function by observing realized demand over time introduces an obvious tension between exploration/estimation (demand learning) and optimization (pricing). Returning to the main question posed above, our interest is in constructing suitable learning and pricing policies that make the ratio in (6) large (and ideally close to one). 4 Main Results The nonparametric pricing algorithm. Before introducing the algorithm we need to define a Q price grid. Let Bp := di=1 [pi , pi ] denote the minimum volume hyper-rectangle in Rd+ , such that k j Bp ⊇ Dp . Given a positive integer κ, one can divide each interval [pi , pi ], i = 1, ..., d into κ1/d intervals of equal length. Define the resulting grid of points in Rd+ as Bpκ . Let e = (1, . . . , 1) ∈ Rℓ . The following algorithm defines a class of admissible learning and pricing policies that are parametrized by a triplet of tuning parameters (τ, κ, δ), where τ ∈ (0, T ], κ is a positive integer and δ > 0. Algorithm 1: π(τ , κ, δ) Step 1. Initialization: (a) Set the learning interval to be [0, τ ], and set κ to be the number of prices to experiment with. Put ∆ = τ /κ. (b) Define P κ = {p1 , ..., pκ } to be the prices to experiment with over [0, τ ], where P κ ⊇ Bpκ ∩ Dp . 11 Step 2. Learning/experimentation: (a) On the interval [0, τ ] apply pi from ti−1 = (i − 1)∆ to ti = i∆, i = 1, 2, ..., κ as long as inventory is positive for all resources. If some resource is out of stock, apply p∞ up until time T and STOP. (b) Compute ˆ i ) = total demand over [ti−1 , ti ) , d(p ∆ i = 1, ..., κ. (7) Step 3. Optimization: For i = 1, ..., κ, ˆ i )T ≤ x + δe, then If Ad(p ˆ i) r̂(pi ) = pi · d(p else r̂(pi ) = 0. End If End For Set p̂ = arg max{r̂(p) : p ∈ P κ }. Step 4. Pricing: On the interval (τ, T ] apply p̂ until some resource is out of stock, then apply p∞ for the remaining time. We will describe the intuition behind the construction of Algorithm 1 in the discussion following Theorem 1. Regarding Step 4, it is clear that any practical implementation of this algorithm would not “shut off” all the demand once only one resource becomes unavailable but would rather do so only for those products that use the unavailable resource. The result we present in Theorem 1 is valid for policies that improve upon the above algorithm by refining Step 4 through partial and/or gradual demand “shut off.” Asymptotic analysis. We consider a regime in which both the number of initial resources (or capacity) as well as potential demand grow proportionally large. In particular for any positive integer n, the initial resource vector is now assumed to be xn = nx and the demand function is λn (·) = nλ(·). Thus, n determines both the order of magnitude of inventories and the rate of demand; when n is large this scaling characterizes a regime with a high volume of sales. We will denote by Pn the set of admissible policies for a system with scale n, and the expected revenues 12 under a policy πn ∈ Pn will be denoted Jnπ (x, T ; λ). With some abuse of notation we will occasionally use π to denote a sequence {πn : n = 1, 2, . . .} as well as any element of the sequence, omitting the subscript “n” to avoid cluttering the notation. For each n = 1, 2, . . ., let Jn∗ (x, T |λ) denote the optimal revenues that can be achieved in the full information case, i.e., when the demand function is known a priori. Of course for all n = 1, 2, . . ., we have that Jnπ (x, T ; λ) ≤ Jn∗ (x, T |λ). With this in mind, the following definition characterizes admissible policies that have “good” asymptotic properties. Definition 1 (Asymptotic Optimality) A sequence of admissible policies πn ∈ Pn is said to be asymptotically optimal if Jnπ (x, T ; λ) →1 ∗ (x, T |λ) λ∈L Jn inf as n → ∞. (8) Asymptotically optimal policies are those that achieve the full information upper bound on revenues as n → ∞, uniformly over the class of admissible demand functions. For the purpose of asymptotic analysis we use the following notation: for real valued positive sequences {an } and {bn } we write an = O(bn ) if an /bn is bounded from above for large enough values of n (i.e., lim sup an /bn < ∞). If an /bn is also eventually bounded away from zero (i.e., lim inf an /bn > 0) then we write an ≍ bn . Theorem 1 There exists a sequence of policies {π(τn , κn , δn )} defined by Algorithm 1 that is asymptotically optimal. Remark 1 (rates of convergence) The proof of the above result is constructive in that it exhibits a set of tuning parameters such that πn = π(τn , κn , δn ) is asymptotically optimal. In particular, for τn ≍ n−1/(d+3) , κn ≍ nd/(d+3) , δn = Cn(log n)1/2 n−1/(d+3) , with C > 0 sufficiently large, the convergence rate in (8) is given by (log n)1/2 J π (x, T ; λ) sup 1 − n∗ as n → ∞, =O Jn (x, T |λ) n1/(d+3) λ∈L (9) (10) where d denotes the number of products. The existence of this sequence of policies states that the value of full information diminishes for large n. The choice of tuning parameters in (9) optimizes the rate of convergence, as can be seen from the last step in the proof. Note that τn is shrinking as n gets large, and in that sense the learning horizon is “short” relative to the sales horizon [0, T ]. Ignoring logarithmic terms, we can rewrite (10) informally as Jnπ (x, T ; λ)/Jn∗ (x, T |λ) ≈ 1 − Cτn /T . Hence the loss in revenues is proportional to the relative size of the learning horizon. Intuition and proof sketch. Step 1 in Algorithm 1 consists of setting the first two tuning parameters: τn determines the length of interval used for learning the demand function; and κn 13 sets the number of prices that are experimented with on that interval. In Step 2, prices in the set P n are used to obtain an approximation of the demand function, which as n grows large becomes increasingly accurate due to the strong law of large numbers. To understand Step 3, imagine that the demand function λ(·) was known a priori, and demand was deterministic rather than governed by a Poisson process. The revenue maximization problem would then be given by the following deterministic dynamic optimization problem Z T o nZ T Aλ(p(s))ds ≤ x , p(s) ∈ Dp for all s ∈ [0, T ] . max r(λ(p(s)))ds : (11) 0 0 Gallego and van Ryzin (1997) show that the solution to (11) is constant over time. Their work also establishes that this fixed price yields close to optimal performance when used in the original stochastic system (in the asymptotic regime under consideration). This sheds some light on the optimization problem articulated in Step 3 of the algorithm: based on the observations one forms an estimate of the revenue function and then proceeds to solve an empirical version of the deterministic problem (11), using this solution for the remainder of the time horizon. The choice of the tuning parameter δn allows constraints to be violated, taking into account the estimation “noise.” This avoids restricting too drastically the set of candidate prices in Step 4. The choice of the key tuning parameters, τn and κn , is meant to balance several contradicting objectives. Increasing τn results in a longer time horizon over which the demand function is estimated, however, by doing so there is also a potential loss in revenues that stems from spending “too much time” on learning and exploration. In addition, for every fixed choice of τn , there is an inherent tradeoff between increasing the number of prices to experiment with, κn , and the accuracy of estimating the demand function on this price grid which is dictated by the length ∆n =τn /κn . We now analyze how these parameters impact the ratio Jnπ /Jn∗ . The first error source can be interpreted as an “exploration bias” that is due to experimenting with various prices without using any information about the demand. This will result in potential losses of order τn . The second error source is deterministic and stems from using using only a finite number of prices to search for the optimal solution of (11) (since the demand function is assumed to be K-Lipschitz). The 1/d maximal loss related to this error is of order 1/κn . The last source of error is stochastic, arising from the fact that only noisy observations of the demand function are available. Since each price is held fixed for ∆n = τn /κn units of time, this introduces an error of order (nτn /κn )−1/2 ; this observation is less transparent and is rigorously detailed in the proof using uniform probability bounds for deviations of random variables from their expectation. The overall error is simply the sum of the three sources detailed above, namely 1− Jnπ /Jn∗ 1/2 1 κn ≈ C τn + 1/d + . (nτn )1/2 κn (12) This last expression captures mathematically the tension that must be resolved in choosing the 14 tuning parameters associated with Algorithm 1. Balancing the three error terms in (12) yields the rate of convergence in (10). Price restricted case. Algorithm 1 and the proof of its asymptotic optimality rely heavily on the fact that the set of feasible prices Dp is convex. In many applications, this assumption is not satisfied and the set of feasible prices may be discrete, e.g., Dpk = {p1 , ..., pk }. (The reasons for such constraints can be industry practices, to competition, etc.) We continue to define P as the set of admissible learning and pricing policies when the demand function is unknown, while Pλ denotes the set of pricing policies when λ is known a priori. Here Jn∗ (x, T |λ) is to be interpreted as the optimal revenues in the full information case when the prices are restricted to the discrete price set Dpk . Under the following technical condition, a similar result to Theorem 1 can be established here. Assumption 1 There exists a constant m1 > 0 such that any function λ ∈ L is bounded below by m1 > 0 on Dpk \ {p∞ }, i.e., inf λ∈L {minp∈Dpk \{p∞ } {λj (p) : j = 1, . . . , d}} > m1 . We introduce below an algorithm for the multiproduct price restricted case. The intuition behind its construction is similar to the one underlying Algorithm 1. What distinguish it from the latter are the following: i) there are only two tuning parameter (τ, δ) since the set of feasible prices is discrete; and ii) the deterministic problem that is solved in Step 3 can be formulated as a linear program whose solution prescribes the amount of time that each price is used. Algorithm 2: π(τ , δ) Step 1. Initialization: Set the learning interval to be [0, τ ] and put ∆ = τ /k. Step 2. Learning/experimentation: (a) On the interval [0, τ ] apply pi from ti−1 = (i − 1)∆ to ti = i∆, i = 1, 2, ..., k as long as inventory is positive for all resources. If some resource is out of stock, apply p∞ up until time T and STOP. (b) Compute ˆ i ) = total demand over [ti−1 , ti ] , d(p ∆ i = 1, ..., k. (13) Step 3. Optimization: Let t̂ = (t̂1 , ..., t̂k ) be the solution of the linear program max k nX i=1 pi · dˆi ti : k X Adˆi ti ≤ x − Aeδ, k X i=1 i=1 15 ti ≤ T − τ, ti ≥ 0, i = 1, ..., k o (14) Step 4. Pricing: For each i = 1, ..., k, apply pi for t̂i time units on (τn , T ] until some resource is out of stock, then apply p∞ for the remaining time. We now analyze the performance of Algorithm 2 in the context of the asymptotic regime introduced earlier in this section. Theorem 2 Suppose that the set of prices is Dpk = {p1 , ..., pk } and let Assumption 1 hold. Then there exists a sequence of policies {π(τn , δn )} defined by Algorithm 2 that is asymptotically optimal. Remark 2 (rates of convergence) Setting τn ≍ n−1/3 and δn = Cn(log n)1/2 n−1/3 with C > 0 sufficiently large, we get that (log n)1/2 J π (x, T ; λ) sup 1 − n∗ =O Jn (x, T |λ) n1/3 λ∈L as n → ∞. (15) We note that, in contrast with (10), the rate of convergence in this case does not depend on the number of products (d). 5 Illustrative Numerical Examples Note that Jn∗ (x, T |λ) is not readily computable in most cases, however an upper bound is easy to obtain through the value of the deterministic optimization problem given in (11). In addition this upper bound is fairly tight for moderate sized problems (see Gallego and van Ryzin (1997)), hence one can compute a “good” lower bound on the ratio Jnπ (x, T ; λ)/Jn∗ (x, T |λ) based on this deterministic relaxation. The results for learning and pricing policies πn depicted in the Tables 1-4 are based on running 500 independent simulation replications from which the performance indicators were derived by averaging. The standard error for estimating Jnπ (x, T ; λ)/Jn∗ (x, T |λ) was below 0.5% in all cases. Example 1 (Single product example) We start with a single product example, i.e., d = m = 1, and the matrix A is just a scalar equal to 1 (i.e., the inventory is counted in units of this product). We consider two underlying demand models: an exponential and a linear model. The parameters used to generate the results depicted on Table 1 are as follows: λ(p) = 10 exp(1 − p). In Table 2, we took λ(p) = (10 − 2p)+ . For both cases, the set of feasible prices was taken to be Dp = [0.1, 6] and the time horizon was taken to be T = 1. In both tables, we present the ratio of the performance of the policy π defined through Algorithm 1 to the optimal performance in the full information case for three problem sizes (n) and seven 16 normalized inventory sizes (x). In each case, we indicate the number of prices used by the policy κ, the time dedicated to learning τ , and the proportion of initial products sold during the learning phase α. (The tuning parameters are defined via (9) with C = 1 for each n.) Problem “size” n = 102 n = 103 n = 104 Tuning parameters κ = 5, τ = 0.31 κ = 7, τ = 0.17 κ = 12, τ = 0.10 Jnπ /Jn∗ α Jnπ /Jn∗ α Jnπ /Jn∗ α 3 .42 84 .69 39 .91 18 5 .61 51 .78 23 .95 11 7 .78 36 .68 17 .85 8 9 .74 28 .90 13 .94 6 11 .74 23 .90 11 .93 5 13 .74 18 .90 9 .93 4 15 .74 17 .90 8 .93 4 x Table 1: Exponential demand function. Jnπ /Jn∗ represents a lower bound on the ratio of the performance of the policy π(τ, κ, δ) to the optimal performance in the full information case, and x is the normalized inventory level. Here κ = number of prices used by the policy π; τ = fraction of time allocated to learning; and α = proportion of inventory sold during the learning phase (in %). Problem “size” n = 102 n = 103 n = 104 Tuning parameters κ = 5, τ = 0.31 κ = 7, τ = 0.17 κ = 12, τ = 0.10 Jnπ /Jn∗ α Jnπ /Jn∗ α Jnπ /Jn∗ α 2 .45 85 .72 44 .83 23 3 .59 57 .73 29 .87 15 4 .77 42 .90 22 .89 11 5 .83 33 .95 18 .95 9 6 .83 28 .88 15 .95 8 7 .83 24 .88 12 .95 7 8 .82 21 .88 11 .95 6 x Table 2: Linear demand function. Jnπ /Jn∗ represents a lower bound on the ratio of the performance of the policy π(τ, κ, δ) to the optimal performance in the full information case, and x is the normalized inventory level. Here κ = number of prices used by the policy π; τ = fraction of time allocated to learning; and α = proportion of inventory sold during the learning phase (in %). 17 We observe that with inventory levels of the order of a few thousands, the expected revenues under the proposed policy are close to 90% of the expected revenues in the full information case where knowledge of the demand function enables us to optimally solve the dynamic pricing problem. Note that for inventories of the order of thousands, the policy utilizes approximately 17.7% of the time horizon T to learn the demand function and experiments with only 7 prices. It might also seem surprising that the performance of the algorithm does not necessarily improve with the inventory size. For example, the entries for x = 7 and x = 9 in Table 1 show that the performance of the algorithm is relatively better in the first case (.78 versus .74 for n = 102 ). When analyzing these results, one should keep in mind that both the numerator and the denominator in the ratio Jnπ /Jn∗ vary and hence there is no particular reason to expect a monotonic behavior of the ratio. In addition, as the initial inventory changes the price sought after in Step 3 of Algorithm 1 varies and might be further away from one of the points contained in the price grid Pnκn . Example 2 (Network example) We consider now an example with two products and three resources where the underlying demand is separable. In particular we use the following demand model: λ(p1 , p2 ) = (5 − p1 , 9 − 2p2 )′ . The set of feasible prices is Dp = [0.5, 5.5] × [0.5, 5] and T = 1. The first, second and third rows of the capacity consumption matrix A are given by (1, 1), (3, 1) and (0, 5) respectively. For example, this means that product 1 requires 1 unit of resource 1, 3 units of resource 2 and no units of resource 3. In Table 3, we give performance results for the policy defined by Algorithm 1 (with tuning parameters given in (9) with C = 1 for each n). Problem “size” n = 102 n = 103 n = 104 Tuning parameters κ = 10, τ = 0.40 κ = 17, τ = 0.25 κ = 50, τ = 0.16 Jnπ /Jn∗ Jnπ /Jn∗ Jnπ /Jn∗ ( 3, 5, 7) .64 .72 .73 (15, 5, 7) .64 .73 .74 (15, 8, 7) .55 .77 .72 (15, 8, 30) .71 .81 .88 (15, 12, 30) .65 .86 .90 x Table 3: Network example with linear demand function. Jnπ /Jn∗ represents a lower bound on the ratio of the performance of the policy π(τ, κ, δ) to the optimal performance in the full information case, and x is the normalized inventory level. Here κ = number of prices used by the policy π; τ = fraction of time allocated to learning. Contrasting with the single product example we clearly see the effects of dimensionality coming into play. For problem sizes of the order of 103 , the number of price points tested grows from 7 in the single product case to 17 in the two product case. In conjunction, the time allocated to learning 18 increases from 17% of the selling season to 25% of the selling season. Focusing on performance, we note that for problems whose size is of the order of thousands, the performance of the proposed policy exceeds 72% of the optimal full information revenues for all intial inventory vectors tested. Example 3 (Network example in the price restricted case) The parameters of the demand function are taken to be as in Example 2, however now the set of prices is restricted to the finite set Dp5 = {(0.5, 3), (0.5, 0.5), (1.1, 2), (4, 4), (4, 6.5)}. In Table 4, we illustrate the performance of the policies defined by Algorithm 2 with τn = n−1/3 and δn = (5 log n)1/2 n−1/3 . Problem “size” n = 102 n = 103 n = 104 Tuning parameters τ = 0.22 τ = 0.10 τ = 0.05 Jnπ /Jn∗ Jnπ /Jn∗ Jnπ /Jn∗ ( 3, 5, 7) .41 .71 .93 (15, 5, 7) .41 .72 .93 (15, 8, 7) .72 .87 .95 (15, 8, 30) .82 .91 .95 (15, 12, 30) .83 .92 .96 x Table 4: Network example (price restricted case) with linear demand function. Jnπ /Jn∗ represents a lower bound on the ratio of the performance of the policy π(τ, δ) to the optimal performance in the full information case, and x is the normalized inventory level. Here τ = fraction of time allocated to learning. The results in Table 4 show that the policies consistently exceed 93% of the full information benchmark for n = 104 , illustrating the faster convergence rate claimed in Remark 2 (in comparison to that in Remark 1). It is also interesting to note that such performance is reached by only allocating 5% of the selling horizon to the learning phase. 6 Concluding Remarks The curse of dimensionality and efficiency of the algorithms. When prices are restricted to a discrete set then the asymptotic performance of Algorithm 2, measured in terms of the rate of convergence given in Remark 2, is independent of the number of products being priced. In contrast, when the price set is not discrete then one needs to experiment with sufficiently many price combinations to “cover” the domain of the unknown demand function. This approach suffers from the curse of dimensionality, evident in the rate guarantee given in Remark 1 (following Theorem 1) which degrades as the number of products d increases. Numerical results in Section 4 clearly illustrate the difference between the price restricted and unrestricted case with regard to these 19 dimensionality effects. This problem will persist in any scheme that involves static sampling of the price domain, and one would have to resort to adaptive methods in order to improve performance. If one restricts the class of demand functions by imposing further smoothness assumptions, then it is possible to “mitigate” the curse of dimensionality. Essentially, as smoothness increases one needs fewer points to construct a good approximation of the demand function. This direction is very appealing from a practical implementation perspective. On the Poisson process assumption. We have made the assumption that requests for products arrive according to a Poisson process whose rate is the underlying demand function (evaluated at the given price). As stated already in the introduction, this assumption is made primarily for concreteness and in order to keep technical details to a bare minimum. In essence, the notion of asymptotic optimality we advocate in this paper only relies on a rough estimate of the rate of convergence in the strong law of large numbers. Thus, parallel results to the ones given in Section 4 can be derived under far more general assumptions on the underlying point process that governs demand. Extensions. Our approach hinges on the fact that the revenue management problem being discussed can be “well approximated” by an appropriate deterministic relaxation which admits a simple solution. This is encoded into Step 3 of almost all algorithms described in this paper. Roughly speaking, this ensures that a static fixed price nearly maximizes revenues in the full information case (cf. Gallego and van Ryzin (1994, 1997)). Problems that admit such structure appear in numerous other contexts (for examples that focus on pricing in service systems see, e.g., Paschalidis and Tsitsiklis (2000) and Maglaras and Zeevi (2005)), hence the techniques developed in this paper may prove useful in those problems as well. Adaptive algorithms. One aspect that has not been addressed in the present paper is that of implementation. Even though the performance of the proposed algorithms was shown to be near-optimal, the decision maker will not necessarily wish to fully separate the learning phase from the pricing phase in the manner prescribed by Algorithms 1 and 2. In particular, it may be appealing to make the learning adaptive so that only relevant regions of the feasible price set are explored in the experimentation phase. That is, the estimation and optimization stages might be pursued simultaneously rather than sequentially. Adaptive schemes can be used to perform a better localized search for the near optimal fixed price, and can also exploit further smoothness assumptions characterizing the unknown demand function to reduce the optimality gap. Exploiting parametric assumptions. Consider a scenario in which the demand function has a known parametric structure, with parameter values that are unknown to the decision maker. The obvious question here is whether one can construct algorithms which exploit this information and achieve better performance, relative to the nonparametric methods studied in this paper. The 20 main point to consider here is that any algorithm that relies on parametric assumptions involves significant model risk, as the true demand function may not (and most likely will not) belong to a parametric family. Addressing the above question would provide a means for rigorously quantifying the “price” that one pays for eliminating model misspecification risk. Treating time-inhomogeneities. The question here is the following: how should a firm jointly learn and price when the demand function is unknown and time dependent. It stands to reason that if one wants to capture a rich time inhomogeneous structure, then one would need to resort to nonparametric approaches. The method developed in this paper hopefully provides a first step in this direction. A Proofs of Main Results Notation. In what follows, if x and y are two vectors, x 6≤ y if and only if xi > yi for at least one component i; x+ will denote the vector in which the ith component is max{xi , 0}. We define ā := max{aij : 1 ≤ i ≤ m, 1 ≤ j ≤ d}, where aij are the entries of the capacity consumption matrix A. Ci , i ≥ 1 will denote positive constants which are independent of a given demand function, but may depend on the parameters of the class of admissible demand functions L and on A, x and T . Recall that e denotes the vector of ones in Rℓ . For a sequence {an } of real numbers, we will say it converges to infinity at a polynomial rate if there exist β > 0 such that lim inf n→∞ an /nβ > 0. With some abuse of notation, for a vector, y ∈ Rd+ and a d-vector of unit rate Poisson processes N (·), we will use for N (y) to denote the vector with ith component N i (y i ), i = 1, . . . , d. Finally Comment 1. Recall the definition of problem (4). Since Dp is bounded, the price charged for any product never exceeds, say M̄ . Consider a system where backlogging is allowed in the following sense: for each unit of resource backlogged the system incurs a penalty of M̄ . Recall that A is assumed to be integer valued with no zero column, and hence anytime the new system receives a request such that no sufficient resources are available to fulfill it, a penalty of at least M̄ is incurred. Consider any admissible policy π that applies p∞ for the remaining time horizon as soon as one resource is out of stock. (Note that all the policies introduced in the main text are of this form.) Since M̄ exceeds the price that the system receives, the expected revenues of such a policy π in the original system J π (x, T ; λ) are bounded below by the ones in the new system (note that in the latter, π does not apply p∞ if the system runs out of any resource). Comment 2. We will denote by J D (x, T |λ) the optimal value of the deterministic relaxation (11). First note that JnD = nJ D . We will also use the fact that inf J D (x, T |λ) ≥ mD , λ∈L where mD = mT ′ > 0, and T ′ = min{T, min1≤i≤ℓ xi /(āM d)}. Indeed, for any λ ∈ L, there is a 21 price q ∈ Dp such that r(q) ≥ m. Consider the policy that applies q on [0, T ′ ] and then applies p∞ up until T . This solution is feasible since Aλ(q)T ′ ≤ dāM T ′ e ≤ x. In addition the revenues generated from the policy above are given by mT ′ . Proof of Theorem 1. Fix λ ∈ L. For simplicity, we restrict attention to the product set Q Dp = di=1 [pi , pi ]. Let M̄ = max1≤i≤d pi be the maximum price a customer will ever pay for a product. It is easy to verify that the deterministic optimization problem given (11) is a convex problem whose solution is given by a constant price vector p̃ (cf. Gallego and van Ryzin (1997)). Let π be the policy defined by means of Algorithm 1. Step 1. We first focus on the the learning and optimization phases. Let τn be such that τn = o(1) and nτn → ∞ at a polynomial rate. Let κn be a sequence of integers such that κn → ∞ k j 1/d and n∆n := nτn /κn → ∞ at a polynomial rate. Divide each interval [pi , pi ], i = 1, ..., d into κn k j 1/d d hyper equal length intervals and consider the resulting grid in Dp . The latter has κ′n = κn rectangles. For each one, let pi be the largest vector (where the largest vector of a hyper rectangle Qd κ′n = {p , p , ..., p ′ }. Note that 1 2 κn i=1 [ai , bi ] is defined to be (b1 , ..., bd )) and consider the set P κ′n /κn → 1 as n → ∞ and with some abuse of notation, we use both κn and κ′n interchangeably. Now partition [0, τn ] into κn intervals of length ∆n and apply the price vector pi on the ith interval. Define P P N n∆n ij=1 λ(pj ) − N n∆n i−1 j=1 λ(pj ) b i) = λ(p n∆n , i = 1, ..., κn , b i ) denotes the number of requests where N (·) is the d-vector of unit rate Poisson processes. Thus λ(p for each product over successive intervals of length ∆n , normalized by n∆n . We now choose the “best” price among “almost feasible prices.” Specifically, we let δn = 1/d C1 (log n)1/2 max{1/κn , (n∆n )−1/2 } where C1 > 0 is a design parameter to be chosen later. and b i ) if Aλ(p b i )T ≤ x + eδn ; otherwise set r̂(pi ) = 0. The objective of this step is to set r̂(pi ) = pi · λ(p discard solutions of the deterministic problem which are essentially infeasible. (The slack term δn allows for “noise” in the observations.) Let p̂ = pi∗ where i∗ = arg max{r̂(pi ), i = 1, ..., κn }. (A-1) Step 2. Here, we derive a lower bound on the expected revenues under the policy π. We will need the following two lemmas whose proofs are deferred to Appendix B. Lemma 1 Fix η > 0. Suppose that µj ∈ (0, M ), j = 1, ..., d. and rn = nβ with β > 0. Then, if −1/2 ǫn = C(η)(log n)1/2 rn with C(η) = 2dη 1/2 āM 1/2 , then the following holds C 2 P A(N (µrn ) − µrn ) 6≤ rn ǫn e ≤ η , n C 2 P A(N (µrn ) − µrn ) 6≥ −rn ǫn e ≤ η , n 22 where C2 > 0 is an appropriately chosen constant. From now on, we fix η ≥ 2 and C1 = 2 max{1, p̄}C(η). Using the previous lemma, we have the following b i )T ≤ x + δn e}. Then for a suitably large constant C3 > 0 Lemma 2 Let Pfn = {pi ∈ P κn : Aλ(p (L) Pκ n C 3 P r(p̃) − r(p̂) > δn ≤ η , n C 3 P p̂ ∈ / Pfn ≤ η . n = (L) i=1 λ(pi )n∆n , (P ) (L) (P ) Xn (P ) = λ(p̂)n(T −τn ) and put Yn = AN (Xn +Xn ). In the rest P n b (L) (P ) (L) Aλ(pi )n∆n +AN (Xn +Xn )−AN (Xn ) of the proof, we will use the fact that given p̂, Yn = κi=1 We define Xn (L) (P ) and that N (Xn + Xn ) − N (Xn ) has the same distribution as N (Xn ). Recalling Comment 1 in the preamble of the appendix, note that Yn is the total potential demand (for resources) under π if one would never use p∞ and that one can lower bound the revenues under π as follows h i h + i Jnπ ≥ E p̂ · [N (Xn(L) + Xn(P ) ) − N (Xn(L) )] − M̄ e · E Yn − nx . (A-2) The first term on the RHS of (A-2) can be bounded as follows h i E p̂ · [N (Xn(L) + Xn(P ) ) − N (Xn(L) )] ii h h = E E p̂ · N (λ(p̂)n(T − τn )) p̂ h i = E r(p̂) n(T − τn ) ( i h (a) = r(p̃) + E r(p̂) − r(p̃) r(p̂) − r(p̃) > −δn P r(p̂) − r(p̃) > −δn ) i h +E r(p̂) − r(p̃) r(p̂) − r(p̃) ≤ −δn P r(p̂) − r(p̃) ≤ −δn n(T − τn ) h C4 i ≥ r(p̃) − δn − η n(T − τn ), n (b) (A-3) where C4 is a suitably large positive constant. Note that (a) follows from conditioning and (b) follows from Lemma 2 and the fact that r(·) is bounded say by dM̄ M . Let us now examine the second term on the RHS of (A-2). Let C ′ > 0 be a constant to be specified later and δn′ = C ′ δn . h + i h + i E Yn − nx = E Yn − nx Yn − nx ≤ nδn′ e P(Yn − nx ≤ nδn′ e) h + i +E Yn − nx Yn − nx 6≤ nδn′ e P(Yn − nx 6≤ nδn′ e) i h ≤ nδn′ e + E Yn Yn 6≤ nx + nδn′ e P(Yn − nx 6≤ nδn′ e), Now, for a Poisson random variable Z with mean µ, it is easy to see that E[Z | Z > a] ≤ a + 1 + µ. In particular, each component of Yn is a Poisson random variable with rate less than nM T and 23 hence i h E Yn Yn 6≤ nx + nδn′ e ≤ nx + (nδn′ + 1 + nM T )e. Let us evaluate the probability to run out of some resource by more than nδn′ . Specifically, P Yn − nx 6≤ nδn′ e 1 ≤ P AN (nλ(p̂)(T − τn )) − Anλ(p̂)(T − τn ) 6≤ nδn′ e 3 κn ′ X δ b i )n∆n 6≤ 1 nδ′ e Aλ(p (A-4) +P Aλ(p̂)n(T − τn ) 6≤ n(x + e n ) + P n 3 3 i=1 Consider the first term on the RHS of (A-4). We have nδn′ > n(T −τn )3C(η)(log n)1/2 (n(T −τn ))−1/2 for n large enough and hence, if C ′ ≥ 3T , one can condition on p̂ and apply Lemma 1 (with µ = λ(p̂), rn = n(T − τn )) to get 1 P AN (λ(p̂)n(T − τn )) − Aλ(p̂)n(T − τn ) 6≤ nδn′ e 3 h i ≤ E P AN (λ(p̂)n(T − τn )) − Aλ(p̂)n(T − τn ) 6≤ n(T − τn )(C1 C ′ /3T )(log n)1/2 (n(T − τn ))−1/2 e | p̂ ≤ C3 . nη Consider now the second term on the RHS of (A-4) δ′ P Aλ(p̂)n(T − τn ) 6≤ n(x + n e) 3 δ′ 1 b b (x + n e) = P A[λ(p̂)T − λ(p̂)T ] + Aλ(p̂)T 6≤ 1 − τn /T 3 ′ δ δ′ b b ≤ P A[λ(p̂)T − λ(p̂)T ] 6≤ n e + P Aλ(p̂)T 6≤ x + n e 6 6 ′ δn′ δ n b b e + P A λ(p̂)T ≤ 6 x + e . = P A(λ(p̂)n∆n T − λ(p̂)n∆ T ) ≤ 6 n∆ n n 6 6 (A-5) Suppose that C ′ ≥ 6. Then by Lemma 2, the second term above is bounded by C5 /nη for a large enough choice of C5 > 0. The first term on the RHS of (A-5) is upper bounded by C3 /nη by Lemma 1. Consider the third term on the RHS of (A-4). ! κn κn X X 1 ′ b b i )n∆n 6≤ 1 n δ′ e Aλ(pi )n∆n 6≤ nδn e P P Aλ(p ≤ n 3 3 κn i=1 i=1 κn 1 δ′ X n e − Aλ(pi ) . P A[N (λ(pi )n∆n ) − λ(pi )n∆n ] 6≤ n∆n = 3 τn i=1 Now if δn′ /τn → ∞ (which holds, for example if τn = n−1/(d+3) , κn = nd/(d+3) ), then for n sufficiently large, we have (1/3)δn′ /τn e − Aλ(pi ) ≥ 1 for all i = 1, ..., κn and Lemma 1 yields ! κn X C3 C3 1 ′ b i )∆n 6≤ nδ e ≤ κn η ≤ η−1 . Aλ(p P n 3 n n i=1 24 We conclude that with C ′ = max{3T, 6} and for some C6 > 0, P(Yn 6≤ nx + nδn′ e) ≤ C6 /nη−1 , and in turn E i C h + i h 6 Yn − nx ≤ nδn′ e + E Yn Yn 6≤ nx + nδn′ e η−1 . n (A-6) Combining (A-2), (A-3) and (A-6) we have h C6 C4 i Jnπ ≥ r(p̃) − δn − η n(T − τn ) − M̄ nδn′ − M̄ (nx · e + nδn′ + 1 + nM T ) η−1 n n h C6 C6 C6 i C4 ′ = r(p̃)nT − n (T − τn )δn + (T − τn ) η + M̄ C δn + (M̄ x · e + M T ) η−1 + C ′ δn η−1 + η−2 n n n n i h (a) ≥ r(p̃)nT − nC7 τn + δn + 1/nη−2 , where (a) follows from the fact that δn → 0 and by choosing C7 > 0 is suitably large. Step 3. We now conclude the proof. Note that under the current assumptions, Dλ is convex. Gallego and van Ryzin (1997, Theorem 1) show that under these conditions the optimal value of problem (11) say JnD serves as upper bound to Jn∗ . Note that JnD = nr(p̃)T . Define f (n) := i h C7 τn + δn + 1/nη−2 and note that f (n) ≥ 0 for all n ≥ 0 and that f (n) → 0 as n → ∞. In addition f (n) does not depend on the specific underlying demand λ. By the remark in the preamble, JnD ≥ nmD > 0 and hence Jnπ Jnπ f (n) ≥ ≥1− D ∗ D Jn Jn m implying that uniformly over λ ∈ L lim inf n→∞ Jnπ ≥ 1. Jn∗ This, in conjunction with the inequality Jnπ ≤ Jn∗ , completes the proof. To obtain the rate of convergence stated in (10) in Remark 1 note that the orders of the terms τn and δn are balanced by choosing τn = n−1/(d+3) and κn = nd/(d+3) . With this choice we have for C8 = C7 /mD , f (n)/mD = C8 [(log n)1/2 /n1/(d+3) + 1/nη−1 ], implying that sup lim sup λ∈L n→∞ 1 − Jnπ /Jn∗ < ∞. (log n)1/2 n−1/(d+3) Proof of Theorem 2. Fix λ ∈ L and η ≥ 1. Denote by {λ1 , ..., λk } the intensities corresponding to the prices {p1 , ..., pk }. Define B to be the matrix with ith column equal to Aλi and let (P0 ) denote the following linear optimization problem max k nX i=1 pi · λi ti : Bt ≤ x, k X i=1 o ti ≤ T, ti ≥ 0, i = 1, ..., k. ∗ The optimal value of (P0 ), V(P is known to be an upper bound to J ∗ (cf. Gallego and van Ryzin 0) (1997, Theorem 1)). For a system with “size” n, the optimal value is just n times the optimal value 25 of the system with size 1, and the optimal solutions are the same. In what follows, for any feasible vector t, we use V(P0 ) (t) to denote the value of the objective function. Step 1. We first focus on the the learning and optimization phases. Let τn be such that τn = o(1) and nτn → ∞ as n → ∞ at a polynomial rate. Divide τn into k intervals of equal length ∆n = τn /k. Apply each feasible price during ∆n time units. Let Pi−1 P λj N n∆n ij=1 λj − N n∆n j=1 b i) = λ(p , i = 1, ..., k. n∆n Let (P̂ ) denote the following linear optimization problem max k nX i=1 b i )ti : pi · λ(p k X i=1 b i )ti ≤ x − Aeδn , Aλ(p k X i=1 o ti ≤ T − τn , ti ≥ 0, i = 1, ..., k. , where δn := C1 (log n)1/2 (n∆n )−1/2 with C1 > 0 to be specified later. For n sufficiently large, the feasible set of (P̂ ) is nonempty (since x − Aeδn ≥ 0) and compact and hence the latter admits an optimal solution, say t̂. In what follows, for any feasible vector t, we use V(P̂ ) (t) to denote the value of the objective function. Step 2. Here, we derive a lower bound on the expected revenues under the policy π. Consider applying the solution t̂ to the stochastic system on the interval (τn , T ]. Let M̄ := max{p1 , ..., pk } P P (L) (L) (i) and define Xn := ki=1 nλ(pi )∆n , Xn := ij=1 nλj t̂j , i = 1, . . . , k. Finally put Yn = AN (Xn + (k) Xn ). As noted in the preamble of the appendix, one can lower bound Jnπ as follows # " k i i h h X − M̄ e · E (Yn − nx)+ pi · N Xn(L) + Xn(i) − N Xn(L) + Xn(i−1) Jnπ ≥ E = n i=1 k X i=1 i h pi · λi E[t̂i ] − M̄ e · E (Yn − nx)+ , (A-7) (L) (i−1) (L) (i) is where the equality follows from the fact that that given t̂, N Xn + Xn − N Xn + Xn distributed as a Poisson random variable with mean λi tˆi . n o b i )kT ≤ δn , min1≤i≤k λ(p b i ) ≥ m1 /2e . Since revenues are Let H := ω : max1≤i≤k kλi − λ(p non-negative, we can lower bound the first sum in (A-7) above as follows k X i=1 pi · λi E[t̂i ] ≥ E k hX i=1 i pi · λi t̂i H P(H). Lemma 3 For ω ∈ H, t̂ is feasible for (P0 ) and for C2 , C3 > 0 suitable large, we have V(P0 ) (t̂) ≥ V(P̂ ) (t̂) − C2 δn , (A-8) ∗ V(P̂ ) (t̂) ≥ V(P − C3 max{δn , τn }. 0) (A-9) 26 We deduce that E k hX i=1 i pi · λi t̂i H = (a) ≥ (b) ≥ i h E V(P0 ) (t̂) H i h E V(P̂ ) (t̂) − C2 δn H ∗ V(P − (C2 + C3 ) max{δn , τn }, 0) where (a) follows from (A-8) and (b) follows from (A-9). We now turn to bound the probability of the event Hc P(Hc ) (a) ≤ (b) ≤ (c) ≤ b i )kT > δn ) + P( min λ(p b i ) < m1 /2e) P( max kλi − λ(p 1≤i≤k k X b i )kT > δn ) + P(kλi − λ(p i=1 d k X X P(|λji i=1 j=1 (d) ≤ 1≤i≤k C4 , nη k X i=1 bi − λi < m1 /2e − λi ) P(λ bj (pi )| > δn /T ) + −λ d k X X i=1 j=1 bj − λj < −m1 /2) P(λ i i where C4 > 0 is suitable large, (a), (b), (c) follow from union bounds and (d) follows from a direct application of Lemma 1 and the appropriate choice of C1 . Hence, n k X i=1 h i C4 ∗ pi · λi E[t̂i ] ≥ n V(P − (C + C ) max{δ , τ } 1 − . 2 3 n n 0) nη (A-10) We now look into the penalty term, i.e., the second term on the RHS of (A-7). To that end, let o n C ′ > 0 to be a constant to be specified, δn′ = C ′ δn and put E := ω : Yn − nx ≤ nδn′ e and note that i h E (Yn − nx)+ = ≤ (a) ≤ i i h h E (Yn − nx)+ E P(E) + E (Yn − nx)+ E c P(E c ) i h nδn′ e + E (Yn − nx)+ E c P(E c ) nδn′ e + (nδn′ + 1 + nM T )P(E c )e, where (a) follows from the definition of E and the fact that for a Poisson random variable Z with mean µ, E[Z | Z > a] ≤ a + 1 + µ. Now, k k i X h X b i )∆n 6≤ nx + nδ′ Anλ(p A N Xn(L) + Xn(i) − N Xn(L) + Xn(i−1) + P(E c ) = P n i=1 i=1 k X i h 1 A N Xn(L) + Xn(i) − N Xn(L) + Xn(i−1) 6≤ nx + nδn′ ≤ P 2 i=1 ! k X b i )∆n 6≤ 1 nδ′ . Anλ(p +P 2 n i=1 27 ! ! (A-11) Using Lemma 1, the second term on the RHS of (A-11) is seen to be bounded by C5 /nη . On the other hand, the first term on the RHS of (A-11) can be bounded as follows ! k i h X 1 ′ (L) (i−1) (L) (i) 6≤ nx + nδn A N Xn + Xn − N Xn + Xn P 2 i=1 k i 1 i hh X A N Xn(L) + Xn(i) − N Xn(L) + Xn(i−1) − nλi t̂i 6≤ nδn′ ≤P 4 i=1 ! ! k k X X 1 ′ b b i )t̂i 6≤ nx An[λi − λ(pi )]t̂i 6≤ nδn + P Anλ(p +P 4 i=1 ! (A-12) i=1 Note that the feasibility of t̂ for (P̂ ) implies that the last term on the RHS above is equal to zero. With an appropriate choice of C ′ , Lemma 1 yields that the first two terms on the RHS of (A-12) are bounded by C6 /nη for C6 > 0 suitably large. We deduce that i h C5 + C6 E (Yn − nx)+ ≤ nδn′ e + (nδn′ + 1 + nM T ) e, nη Combining the above with (A-7) and (A-10), we have Jnπ ≥ n k X i=1 h i h pi · λi E[t̂i ] − M̄ e · E (Yn − nx)+ i C4 C5 + C6 ∗ ≥ n V(P − (C + C ) max{δ , τ } 1 − − M̄ [nδn′ e + (nδn′ + 1 + nM )] 2 3 n n 0) η n nη ∗ η ≥ nV(P0 ) − C9 n(max{δn , τn } + 1/n ). ∗ Step 3. We now conclude the proof. Recalling that mD > 0 bounds below V(P for all λ ∈ L, 0) we have Jnπ Jn∗ ≥ Jnπ C9 (max{δn , τn } + 1/nη ) ≥ 1 − ∗ nV(P mD 0) implying that uniformly over λ ∈ L lim inf n→∞ Jnπ ≥ 1. Jn∗ This, in conjunction with the inequality Jnπ ≤ Jn∗ , completes the proof. To obtain the rate of convergence stated in (15) in Remark 2 note that the orders of the terms δn and τn are balanced by choosing τn ≍ n−1/3 . With this choice we have sup lim sup λ∈L n→∞ 1 − Jnπ /Jn∗ < ∞. (log n)1/2 n−1/3 28 B Proofs of Auxiliary Results In what follows Ci′ , i ≥ 1 will denote positive constants that depend only on A, x, T and the parameters of the class L, but not on a specific function λ ∈ L. Proof of Lemma 1. Let Ji = {j ∈ {1, ..., d} : aij 6= 0}. We proceed with the following inequalities P A[N (µrn ) − µrn ] 6≤ rn ǫn (a) ≤ ℓ X P ≤ aij [N (µj rn ) − µj rn ] > rn ǫn j=1 i=1 ℓ X d X ! X rn ǫn P N (µj rn ) − µj rn > daij i=1 j∈Ji d X rn ǫn P N (µj rn ) − µj rn > ≤ℓ dā j=1 (b) ≤ℓ d X j=1 o n ǫn exp −θj rn (µj + ) + (exp{θj } − 1)µj rn , (B-1) dā where (a) follows from a union bound and (b) follows from the Chernoff bound. The expression in each of the exponents is minimized for the choice of θj > 0 defined by ǫn θj = log 1 + . dāµj (B-2) Plugging back into (B-1) yields d n X ǫn o ǫn ǫn )+ (µj + exp − log 1 + ≤ ℓ P A[N (µrn ) − µrn ] 6≤ rn ǫn dāµj daij dā j=1 n ǫn ǫn o ǫn (M + ) + . ≤ ℓd exp rn − log 1 + dāM dā dā (B-3) For the last inequality, note that the derivative of the term in the exponent with respect to µj is given by − log(1 + ǫn /µj ) + ǫn /µj , which is always positive for ǫn > 0. Now, using a Taylor ǫn expansion we get that for some ξ ∈ [0, dāM ], h ǫn 1 1 ǫ2n ǫn i ǫn (1 + = − )− −M log 1 + dāM dāM dāM 2 1 + ξ d2 ā2 M ǫ2 ≤ − 2 n2 , 4d ā M where the last inequality holds only if ǫn /(dāM ) ≤ 1 (which is valid for sufficiently large n). Substituting for ǫn , we get n (C(η))2 log n o ≤ ℓd exp − P A[N (µrn ) − µrn ] 6≤ rn ǫn 4d2 ā2 M ℓd = , nη 29 Hence the first result follows. The other inequality goes through in a similar fashion. This completes the proof. Proof of Lemma 2. The optimal vector p̃ for the deterministic problem is contained one of the hyper-rectangles comprising the price grid. Let pj be the closest vector to p̃ in the price grid. Note that the index j depends on n but we do not make the n-dependence explicit to avoid cluttering 1/d the notation. We first show that pj ∈ Pfn with high probability. Note that kpj − p̃k ≤ C1′ /κn some C1′ > 0 and hence kλ(pj ) − λ(p̃)k ≤ P pj ∈ / Pfn = ≤ (a) ≤ 1/d KC1′ /κn . we deduce that P AN (λ(pj )n∆n T ) > n∆n (x + δn e) P AN (λ(p̃) + C1′ Kκ−1/d )n∆ T > n∆ (x + δ e) n n n n ′ −1/d )n∆ T > n∆ w P AN (λ(p̃) + C1′ Kκ−1/d )n∆ T − A(λ(p̃) + C Kκ n n n , n 1 n n −1/d where wn = δn e − C1′ KT κn Ae. Note that (a) is a consequence of the feasibility of p̃ for the 1/d deterministic problem (in particular, Aλ(p̃)n∆n T ≤ n∆n x). Now since δn κn wn = δn (e − for 1/d C1′ KT /(δn κn )Ae) → ∞, we have that ≥ δn /2 for n sufficiently large. By using Lemma 1 (where rn and ǫn are here n∆n and δn /2 respectively), we deduce that the above probability is bounded above by C2′ /nη for a sufficiently large C2′ > 0. We then have P r(p̃) − r(p̂) > δn / Pfn + P pj ∈ Pfn , r̂(pj ) = 0 . (B-4) ≤ P r(p̃) − r(p̂) > δn ; pj ∈ Pfn , r̂(pj ) > 0 + P pj ∈ Now under the condition that pj ∈ Pfn , we have r(p̃) − r(p̂) = r(p̃) − r(pj ) + r(pj ) − r̂(pj ) + r̂(pj ) − r̂(p̂) + r̂(p̂) − r(p̂) ≤ r(p̃) − r(pj ) + r(pj ) − r̂(pj ) + r̂(p̂) − r(p̂), where the inequality follows from the definition of p̂ given in (A-1). For the first term on the RHS above, note that for C3′ > 0 suitably large |r(pj ) − r(p̃)| ≤ |pj · λ(pj ) − pj · λ(p̃)| + |pj · λ(p̃) − p̃λ(p̃)| (a) ≤ (b) ≤ (c) ≤ kpj kkλ(pj ) − λ(p̃)k + kλ(p̃)kkpj − p̃k kpj kK C3′ 1/d κn C1′ 1/d κn + kλ(p̃)k C1′ 1/d κn , where (a) follows from Cauchy-Schwarz inequality, (b) follows from the Lipschitz condition on λ and (c) follows from the fact that kpk ≤ M̄ for all p ∈ Dp . Now, recalling comment 2 in the preamble 30 of Appendix A, we have r(p̃)T ≥ mD > 0 and hence for n sufficiently large r(pj ) > mD /(2T ). By Lemma 1, we deduce that ′ b j ) = 0 ≤ C4 P pj ∈ Pfn , r̂(pj ) = 0 ≤ P pj · λ(p nη 1/d Coming back to (B-4), since C3′ /κn < (1/4)δn for n sufficiently large, P r(p̃) − r(p̂) > δn 1 1 C3′ ≤ P r(pj ) − r̂(pj ) > δn − 1/d ; pj ∈ Pfn + P r̂(p̂) − r(p̂) > δn ; pj ∈ Pfn , r̂(pj ) > 0 2 2 κn +P pj ∈ / Pfn + P pj ∈ Pfn , r̂(pj ) = 0 1 C2′ C′ 1 b ≤ P r(pj ) − r̂(pj ) > δn + P p̂λ(p̂) − r(p̂) > δn + η + η4 4 2 n n By Lemma 1 the two first terms on the RHS above are bounded by C5′ /nη for some C5′ > 0 and the proof is complete. Proof of Lemma 3. For ω ∈ H we have k X Aλi t̂i = k X i=1 i=1 (a) ≤ k X i=1 b i ))t̂i A(λi − λ(p b i )kT Ae x − δn Ae + max kλi − λ(p 1≤i≤k (b) ≤ b i )t̂i + Aλ(p x, where (a) follows from the feasibility of t̂ for (P̂ ) and the fact that A has non-negative entries and (b) follows from the fact that ω ∈ H. We deduce that for ω ∈ H, t̂ is feasible for (P0 ). In addition the cost of t̂ in (P0 ) can be lower bounded as follows (where C1′ > 0 is suitable large) V(P0 ) (t̂) = = k X pi · λi t̂i i=1 b i )t̂i + pi · λ(p i=1 k X k X i=1 b i ))t̂i pi · (λi − λ(p b i )kT ≥ V(P̂ ) (t̂) − dM̄ k max kλi − λ(p ≥ V(P̂ ) (t̂) − 1≤i≤k ′ C1 δn . On the other hand, consider an optimal solution t∗ to (P0 ). It is easy to see that such a solution P needs to satisfy ki=1 t∗i ≥ T ′ where T ′ = min1≤j≤m xj /(āM ). Indeed, if it were not the case, then one could strictly improve the objective function by lengthening the time where one applies a price yielding positive revenues. In turn, this implies that for at least one i, t∗i ≥ T ′ /k. Let ηn = max{τn , C2′ δn } with C2′ > 0 suitably large and define t̃ = (t∗ − ηn e)+ . Note that t̃i ≤ t∗i for 31 i = 1, ..., k and t̃i′ = t∗i′ − ηn for n sufficiently large. Hence addition, we have for ω ∈ H k X i=1 b i )t̃i Aλ(p = i: = i: (a) ≤ X t∗i >ηn X b i )t∗ − Aλ(p i Aλi t∗i t∗i >ηn x+ i: X i: t∗i >ηn (b) ≤ + i: X t∗i >ηn X t∗i >ηn Pk i=1 t̃i ≤ Pk bi ηn Aλ b i ) − λi )t∗ − A(λ(p i i: ∗ i=1 ti X t∗i >ηn − τn ≤ T − τn . In b i )ηn Aλ(p b i )kT − C ′ λ(p b i )δn A e max kλi − λ(p 2 1≤i≤k x − Aeδn . where (a) follows from the feasibility of t∗ for (P0 ) and the non-negativity of the elements of A and (b) follows from the conditions defining H, the fact that C2′ is chosen sufficiently large and the fact that for at least one i, t∗i > ηn . We see that t̃ is feasible for P̂ (for ω ∈ H ). Let C3′ = dM̄ M k. The cost of t̃ in (P̂ ) can be lower bounded as follows (where C4′ > 0 is suitably large) V(P̂ ) (t̃) = ≥ = k X i=1 k X i=1 k X b i )t̃i pi · λ(p b i )t∗ − C ′ ηn pi · λ(p i 3 pi · λi t∗i + k X i=1 i=1 b i ) − λi )t∗ − C ′ ηn pi · (λ(p i 3 b i )kT − C ′ ηn ≥ V(P0 ) (t∗ ) − dM̄ k max kλi − λ(p 3 ∗ ≥ V(P0 ) (t ) − 1≤i≤k ′ C4 max{δn , ηn }. References Afèche, P. and Ata, B. (2005), ‘Revenue management in queueing systems with unknown demand characteristics’, working paper, Northwestern University . Araman, V. F. and Caldentey, R. A. (2005), ‘Dynamic pricing for non-perishable products with demand learning’, working paper, New York University . Auer, P., Cesa-Bianchi, N., Freund, Y. and Schapire, R. E. (2002), ‘The nonstochastic multiarmed bandit problem’, SIAM Journal of Computing 32, 48–77. Aviv, Y. and Pazgal, A. (2005), ‘Pricing of short life-cycle products through active learning’, working paper, Washington University . 32 Ball, M. and Queyranne, M. (2006), ‘Toward robust revenue management: Competitive analysis of online booking’, working paper, University of Maryland . Bertsimas, D. and Perakis, G. (2003), ‘Dynamic pricing; a learning approach’, working paper, Massachusetts Institute of Technology . Bitran, G. and Caldentey, R. (2003), ‘An overview of pricing models for revenue management’, Manufacturing & Service Operations Management 5, 203–229. Carvalho, A. X. and Puterman, M. L. (2005), ‘Dynamic pricing and learning over short time horizons’, working paper, University of British Columbia . Cesa-Bianchi, N. and Lugosi, G. (2006), Prediction, learning, and games, Cambridge University Press. Elmaghraby, W. and Keskinocak, P. (2003), ‘Dynamic pricing in the presence of inventory considerations: Research overview, current practices and future directions’, Management Science 49, 1287–1309. Eren, S., Maglaras, C. and van Ryzin, G. (2006), ‘Pricing and product positioning without market information’, working paper, Columbia University . Fisher, M. and Rajaram, K. (2000), ‘Accurate retail testing of fashion merchandise: Methodology and application’, Marketing Science 19, 226–278. Foster, D. P. and Vohra, R. (1999), ‘Regret in the on-line decision problem’, Games and Economic Behavior 29, 7–35. Gallego, G. and van Ryzin, G. (1994), ‘Optimal dynamic pricing of inventories with stochastic demand over finite horizons’, Management Science 50, 999–1020. Gallego, G. and van Ryzin, G. (1997), ‘A multiproduct dynamic pricing problem and its applications to network yield management’, Operations Research 45, 24–41. Gaur, V. and Fisher, M. L. (2005), ‘In-store experiments to determine the impact of price on sales’, Production and Operations Management 14, 377–387. Hannan, J. (1957), ‘Approximation to bayes risk in repeated play’, Contributions to the Theory of Games, Princeton University Press III, 97–139. Iyengar, G. and Zeevi, A. (2005), ‘Effects of parameter uncertainty on the design and control of stochastic systems’, working paper, Columbia University . 33 Keller, G. and Rady, S. (1999), ‘Optimal experimentation in a changing environment’, The review of economic studies 66, 475–507. Kleinberg, R. and Leighton, F. T. (2003), ‘The value of knowing a demand curve: Bounds on regret for online posted-price auctions’, Proc. of the 44th Annual IEEE Symposium on Foundations of Computer Science . Lai, T. L. and Robbins, H. (1985), ‘Asymptotically efficient adaptive allocation rules’, Advances in Applied Mathematics 6, 4–22. Larson, C. E., Olson, L. J. and Sharma, S. (2001), ‘Optimal inventory policies when the demand distribution is not known’, Journal of Economic Theory 101, 281–300. Lim, A. and Shanthikumar, J. (2005), ‘Relative entropy, exponential utility, and robust dynamic pricing’, working paper, University of California Berkeley . Lobo, M. S. and Boyd, S. (2003), ‘Pricing and learning with uncertain demand’, working paper, Duke University . Maglaras, C. and Zeevi, A. (2005), ‘Pricing and design of differentiated services: Approximate analysis and structural insights’, Operations Research 53, 242–262. Paschalidis, I. C. and Tsitsiklis, J. N. (2000), ‘Congestion-dependent pricing of network services’, IEEE/ACM Transactions on Networking 8, 171–184. Robbins, H. (1952), ‘Some aspects of the sequential design of experiments’, Bull. Amer. Math. Soc. 58, 527–535. Rusmevichientong, P., Van Roy, B. and Glynn, P. W. (2006), ‘A non-parametric approach to multi-product pricing’, Operations Research 54, 82–98. Scarf, H. (1959), ‘Bayes solutions of the statistical inventory problem’, Annals of Mathematical Statistics 30, 490–508. Talluri, K. T. and van Ryzin, G. J. (2005), Theory and Practice of Revenue Management, SpringerVerlag. van Ryzin, G. and McGill, J. (2000), ‘Revenue management without forecasting or optimization: An adaptive algorithm for determining airline seat protection levels’, Management Science 46, 760–775. Williams, L. and Partani, A. (2006), ‘Learning through price testing’, 6th Informs Revenue Management Conference, June 2006, http://www.demingcenter.com . 34