Serving Content with Unknown Demand: the High-Dimensional Regime Sharayu Moharir Javad Ghaderi Department of ECE University of Texas at Austin Austin, TX 78712 sharayu.moharir@gmail.com Department of ECE University of Texas at Austin Austin, TX 78712 jghaderi@ee.columbia.edu Sujay Sanghavi Sanjay Shakkottai Department of ECE University of Texas at Austin Austin, TX 78712 sanghavi@mail.utexas.edu Department of ECE University of Texas at Austin Austin, TX 78712 shakkott@austin.utexas.edu ABSTRACT Categories and Subject Descriptors In this paper we look at content placement in the highdimensional regime: there are n servers, and O(n) distinct types of content. Each server can store and serve O(1) types at any given time. Demands for these content types arrive, and have to be served in an online fashion; over time, there are a total of O(n) of these demands. We consider the algorithmic task of content placement: determining which types of content should be on which server at any given time, in the setting where the demand statistics (i.e. the relative popularity of each type of content) are not known a-priori, but have to be inferred from the very demands we are trying to satisfy. This is the high-dimensional regime because this scaling (everything being O(n)) prevents consistent estimation of demand statistics; it models many modern settings where large numbers of users, servers and videos/webpages interact in this way. We characterize the performance of any scheme that separates learning and placement (i.e. which use a portion of the demands to gain some estimate of the demand statistics, and then uses the same for the remaining demands), showing it is order-wise strictly suboptimal. We then study a simple adaptive scheme - which myopically attempts to store the most recently requested content on idle servers and show it outperforms schemes that separate learning and placement. Our results also generalize to the setting where the demand statistics change with time. Overall, our results demonstrate that separating the estimation of demand, and the subsequent use of the same, is strictly suboptimal. C.2.4 [Computer-Communication Networks]: Distributed Systems Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. SIGMETRICS’14, June 16–20, 2014, Austin, Texas, USA. Copyright 2014 ACM 978-1-4503-2789-3/14/06 ...$15.00. http://dx.doi.org/10.1145/2591971.2591978. Keywords Content Delivery Systems; Large Scale Systems; Content Replication Strategies; Matchings 1. INTRODUCTION Ever increasing volumes of multimedia content is now requested and delivered over the Internet. Content delivery systems (e.g., YouTube [23]), consisting of a large collection of servers (each with limited storage/service capability), process and service these requests. Naturally, the storage and content replication strategy (i.e., what content should be stored on each of these servers) forms an important part of the service and storage architecture. Two trends have emerged in such settings of large-scale distributed content delivery systems. First, there has been a sharp rise in not just the volume of data, but indeed in the number of content-types (e.g., number of distinct YouTube videos) that are delivered to users [23]. Second, the popularity and demand for most of this content is uneven and ephemeral ; in many cases, a particular content-type (e.g., a specific video clip) becomes popular for a small interval of time after which the demand disappears; further a large fraction of the content-types languish in the shadows with almost no demand [1, 5]. To understand the effect of these trends, we study a stylized model for the content placement and delivery in largescale distributed content delivery systems. The system consists of n servers, each with constant storage and service capacities, and αn content-types (α is some constant number). We consider the scaling where the system size n tends to infinity. The requests for the content-types arrive dynamically over time and need to be served in an online manner by the free servers storing the corresponding contents. The requests that are “deferred” (i.e., cannot be immediately served by a free server with requested content-type) incur a high cost. To ensure reliability, we assume that there are alternate server resources (e.g., a central server with large enough backup storage and service capacity, or additional servers that can be freed up on-demand) that can serve such deferred requests. The performance of any content placement strategy crucially depends on the popularity distribution of the content. Empirical studies in many services such as YouTube, Peerto-Peer (P2P) VoD systems, various large video streaming systems, and web caching, [2,5,7,17,24] have shown that access for different content-types is very inhomogeneous and typically matches well with power-law (Zipf-like) distributions, i.e., the request rate for the i-th most popular contenttype is proportional to i−β , for some parameter β > 0. For the performance analysis, we assume that the content-types have a popularity that is governed by some power-law distribution with unknown β and further this distribution changes over time. Our objective is to provide efficient content placement strategies that minimize the number of requests deferred. It is natural to expect that content placement strategies in which more popular content-types are replicated more will have a good performance. However, there is still a lot of flexibility in designing such strategies and the extent of replication of each content-type has to be determined. Moreover, the requests arrive dynamically over time and popularities of different content-types might vary significantly over time; thus the content placement strategy needs to be online and robust. The fact that the number of contents is very large and their popularities are time-varying creates two new challenges that are not present in traditional queueing systems. First, it is imperative to measure the performance of content replication strategies over the time scale in which changes in popularities occur. In particular, the steady-state metrics typically used in queueing systems are not a right measure of performance in this context. Second, the number of content-types is enormous and learning the popularities of all content-types over the time scale of interest is infeasible. This is in contrast with traditional multi-class multi-server systems where the number of demand classes does not scale with the number of servers (low-dimensional setting) and thus learning the demand rates can be done in a time duration that does not scale with the system size. 1.1 Contributions The main contributions of our work can be summarized as follows. High dimensional vs. low dimensional: We consider the high dimensional regime where the number of servers, the number of content-types, and the number of requests to be served over any time interval all scale as O(n); further the demand statistics are not known a-priori. This scaling means that consistent estimation of demand statistics is not possible. We show that a “learn-and-optimize” approach, namely, learning the demand statistics based on requests and then locally caching content on servers according to this empirical statistics, is strictly sub-optimal (even when using high-dimensional estimators such as the Good-Turing estimator [13]). This is in contrast to the conventional lowdimensional setting (finite number of content-types) where the “learn-and-optimize” approach is asymptotically optimal. Adaptive vs. learn-and-optimize: We study an adaptive content replication strategy which myopically attempts to cache the most recently requested content-types on idle servers. Our key result is that even this simple adaptive strategy strictly outperforms any content placement strategy based on the “learn-and-optimize” approach. Our results also generalize to the setting where the demand statistics change with time. Overall, our results demonstrate that separating the estimation of demands and the subsequent use of the estimations to design optimal content placement policies is deprecated in the high-dimensional setting. 1.2 Organization and Basic Notations The rest of this paper is organized as follows. We describe our system model and setting in Section 2. The main results are presented in Section 3. Our simulation results are discussed in Section 4. Section 5 contains the proofs of some of our key results. Section 7 gives an overview of related works. We finally end the paper with conclusions. The rest of the proofs can be found in the Appendix. Some of the basic notations are as follows. Given two functions f and g, we write f = O(g) if lim supn→∞ |f (n)/g(n)| < ∞. f = Ω(g) if g = O(f ). If both f = O(g) and f = Ω(g), then f = Θ(g). Similarly, f = o(g) if lim supn→∞ |f (n)/g(n)| = 0, and f = ω(g) if g = o(f ). The term w.h.p. means with high probability as n → ∞. 2. SETTING AND MODEL In this section, we consider a stylized model for large scale distributed content systems that captures two emerging trends, namely, a large number of content types, and uneven and time-varying demands. 2.1 Server and Storage Model The system consists of n front-end servers, each with constant storage and service capacity, and a back-end server that contains a catalog of m content-types (one copy of each content-type, e.g., a copy of each YouTube video). The contents can be copied from the back-end server and placed on the front-end servers. Each front-end server can store at most d content pieces (d is a constant) and serve at most d requests at each time, under the constraint that no two requests can read the same content piece simultaneously on a server1 . The system is essentially equivalent to a system of nd servers, each with 1 content piece storage. Since we are interested in the scaling performance, as n, m → ∞, for clarity we assume that there are n servers and each server can store 1 content and can serve 1 request at any time. 2.2 Service Model When a request for a content arrives, it is routed to an idle (front-end) server which has the corresponding content-type stored on it, if possible. We assume that the service time of each request is exponentially distributed with mean 1. The requests have to be served in an online manner; further service is non-preemptive, i.e., once a request is assigned to 1 Even without the constraint “no two requests can read the same content piece simultaneously on a server”, the performance can be bounded from above by the performance of a system with dn servers with a storage of 1 each, and from below by that of another system with n servers with a storage of 1 each. Thus asymptotically in a scaling-sense, the system is still equivalent to a system of n servers where each server can store 1 content and can serve 1 content request at any time. a server, its service cannot be interrupted and also cannot be re-routed to another server. Requests that cannot be served (no free server with requested content-type) incur a high cost (e.g., need to be served by the back-end server, or content needs to be fetched from the back-end server and loaded on to a new server). As discussed before, we refer to such requests as deferred requests. The goal is to design content placement policies such that the number of requests deferred is minimized. 2.3 Content Request Model There are m content-types (e.g., m distinct YouTube videos). We consider the setting where the number of content-types m is very large and scales linearly with the system size n, i.e., m = αn for some constant α > 1. We assume that requests for each content arrive according to a Poisson process and request rates (popularities) follow a Zipf distribution. Formally, we make the following assumptions on the arrival process. Assumption 1. (Arrival and Content Request Process) - The arrival process for each content-type i is a Poisson process with rate λi . - The P load on the system at any time is λ̄ < 1, where m i=1 λi . λ̄ = n - Without loss of generality, content-types are indexed in the order of popularity. The request rate for contenttype i is λi = nλ̄pi where pi ∝ i−β for some β > 0. This is the Zipf distribution with parameter β. We have used the Zipf distribution to model the popularity distribution of various contents because empirical studies in many content delivery systems have shown that the distribution of popularities matches well with such distributions, see e.g., [5], [2], [24], [7], [17]. 2.4 Time Scales of Change in Arrival Process A key trend discussed earlier is the time-varying nature of popularities in content delivery systems [1, 5]. For example, the empirical study in [5] (based on 25 millions transactions on YouTube) shows that daily top 100 list of videos changes frequently. To understand the effect of this trend on the performance of content placement strategies, we consider the following two change models. Block Change Model: In this model, we assume that the popularity of various content-types remains constant for some duration of time T (n), and then changes to some other arbitrarily chosen distribution that satisfies Assumption 1. Thus T (n) reflects the time-scale over which changes in popularities occur. Under this model, we characterize the performance of content placement strategies over such a timescale T (n). Continuous Change Model: Under this model, we assume that each content-type has a Poisson clock at some constant rate ν > 0. Whenever the clock of content-type i ticks, content-type i exchanges its popularity with some other content-type j, chosen uniformly at random. Note that if T (n) = θ(1), the average time over which the popularity distribution “completely” changes is comparable to that of the Block Change Model; however, here the change occurs incrementally and continuously. Note that this model ensures that the content-type popularity always has the Zipf distribution. Under this model, we characterize the performance of content placement strategies over constant intervals of time. 3. MAIN RESULTS AND DISCUSSION In this section, we state and discuss our main results. The proofs are provided in Section 5. 3.1 Separating Learning from Content Placement In this section, we analyze the performance of storage policies which separate the task of learning and that of content placement as follows. Consider time intervals of length T (n). The operation of the policy in each time interval is divided into two phases: Phase 1. Learning: Over this interval of time, use the demands from the arrivals (see Figure 1) to estimate the content-type popularity statistics. Phase 2. Storage: Using the estimated popularity of various content-types, determine which content-types are to be replicated and stored on each server. The storage is fixed for the remaining time interval. The content-types not requested even once in the learning phase are treated equally in the storage phase. In other words, the popularity of all unseen content-types in the learning phase is assumed to be the same. Phase 1 (Learn) 0 Phase 2 (Static Storage) T(n) time Chosen Optimally Figure 1: Learning-Based Static Storage Policies – The interval T (n) is split into the Learning and Storage phases. The length of time spent in the Learning phase can be chosen optimally using the knowledge of the value of T (n) and the Zipf parameter β. Further, we allow the interval of time for the Learning phase potentially to be chosen optimally using knowledge of T (n) (the interval over which statistics remain stationary) and β (the Zipf parameter for content-types popularity). This is a natural class of policies to consider because it is obvious that popular content-types should be stored on more servers than the less popular content-types. Therefore, knowing the arrival rates can help in the design of better storage policies. Moreover, for the content-types which are not seen in the learning phase, the storage policy has no information about their relative popularity. It is therefore natural to treat them as if they are equally popular. The replication and storage in Phase 2 (Storage) can be performed by any static policy that relies on the knowledge (estimate) of arrival rates, e.g., the proportional placement policy [10] where the number of copies of each content-type is proportional to its arrival rate, or the storage policy of [11] which was shown to be approximately optimal in the steady state. We now analyze the performance of learning-based static storage policies under the Block Change Model defined in Section 2.4 where the statistics remain invariant over the time intervals of length T (n). The performance metric of interest is the number of requests deferred by any policy belonging to class of learning-based static storage policies in the interval of interest. We assume that at the beginning of this interval, the storage policy has no information about the relative popularity of various content-types. Therefore, we start with an initial loading where each content-type is placed on exactly one server. This loading is not changed during Phase 1 (the learning phase) at the end of which, the content-type on idle servers is changed as per the new storage policy. As mentioned before, this storage is not changed for the remaining duration in the interval of interest. The following theorem provides a lower bound on the number of requests deferred by any learning-based static storage policy. Theorem 1. Under Assumption 1 and the Block Change Model defined in Section 2.4, for β > 2, if T (n) = Ω(1), the expected number of requests deferred by any learning-based static storage policy is Ω(n0.5 ). We therefore conclude that even if the division of the interval of interest into Phase 1 (Learning) and Phase 2 (Storage) is done in the optimal manner, no learning-based static storage policy can defer fewer than Ω(n0.5 ) jobs in the interval of interest. Therefore, Theorem 1 provides a fundamental lower bound on the number of jobs deferred by any policy which separates learning and storage. It is worth pointing out that this result holds even when the time-scale of change in statistics is quite slow. Thus, even when T (n), the timescale over which statistics remains invariant, goes to infinity and the time duration of the two phases (Learning, Storage) is chosen optimally based on β, T (n), Ω(n0.5 ) requests are still deferred. Next, we explore adaptive storage policies which perform the task of learning and storage simultaneously. 3.2 Myopic Joint Learning and Placement We next study a natural adaptive storage policy called MYOPIC. In an adaptive storage policy, depending on the requests that arrive and depart, the content-type stored on a server can be changed when the server is idle while other servers of the system might be busy serving requests. Therefore, adaptive policies perform the tasks of learning and placement jointly. Many variants of such adaptive policies have been studied for decades in the context of cache management (e.g. LRU, LRU-MIN [21]). Let Ci refer to the ith content-type, 1 ≤ i ≤ m. The MYOPIC policy works as follows: When a request for contenttype Ci arrives, it is assigned to a server if possible, or deferred otherwise. Recall that a deferred request is a request for which on arrival, no currently idle server can serve it and thus its service invokes a backup mechanism such as a back-end server which can serve it at a high cost. After the assigment/defer decision is made, if there are no currently idle servers with content-type Ci , MYOPIC replaces the content-type of one of the idle servers with Ci . This idle server is chosen as follows: - If there is a content-type Cj stored on more than one currently idle servers, the content-type of one of those servers is replaced with Ci , - Else, place Ci on that currently idle server whose contenttype has been requested least recently among the contenttypes on the currently idle servers. For a formal definition of MYOPIC, refer to Figure 2. 1: On arrival (request for Ci ) do, 2: Allocate request to an idle server if possible. 3: if no other idle server has a copy of Ci , then 4: if ∃j: Cj stored on > 1 idle servers, then 5: replace Cj with Ci on any one of them. 6: else 7: find Cj : least recently requested on idle servers, replace Cj with Ci . 8: end if 9: end if Figure 2: MYOPIC – An adaptive storage policy which changes the content stored on idle servers in a greedy manner to ensure that recently requested content pieces are available on idle servers. Remark 1. Some key properties of MYOPIC are: 1. The content-types on servers can be potentially changed only when there is an arrival. 2. The content-type of at most one idle server is changed after each arrival. However, for many popular contenttypes, it is likely that there is already an idle server with the content-type, in which case there is no content-type change. 3. To implement MYOPIC, the system needs to maintain a list of content types ordered according to the time at which recent most request of each content-type was made. The following theorem provides an upper bound on the number of requests deferred by MYOPIC for the Block Change Model defined in Section 2.4. Theorem 2. Under Assumption 1 and the Block Change Model defined in Section 2.4, over any time interval T (n) such that T (n) = o(nβ−1 ), the number of requests deferred by MYOPIC is O((nT (n))1/β ) w.h.p. We now compare this upper bound with the lower bound on the number of requests deferred by any learning-based static storage policy obtained in Theorem 1. Corollary 1. Under Assumption 1, the Block Change Model defined in Section 2.4, and for β > 2, over any time β interval T (n) such that T (n) = Ω(1) and T (n) = o(n 2 −1 ), the expected number of requests deferred by any learningbased static storage policy is Ω(n0.5 ) and the number of requests deferred by the MYOPIC policy is o(n0.5 ) w.h.p. From Corollary 1, we conclude that MYOPIC outperforms all learning-based static storage policies. Note that: i. Corollary 1 holds even when the interval of interest T (n) grows to infinity (scaling polynomially in n), or correspondingly, even when the content-type popularity changes very slowly with time. ii. Even if the partitioning of the (T (n)) into a Learning phase and a Static Storage phase is done in an optimal manner with the help of some side information (β, T (n)), the MYOPIC algorithm outperforms any learning-based static storage policy. iii. Since we consider the high-dimensional setting, the learning problem at hand is a large-alphabet learning problem. It is well known that standard estimation techniques like using the empirical values as estimates of the true statistics is suboptimal in this setting. Many learning algorithm like the classical GoodTuring estimator [13] and other linear estimators [16] have been proposed, and shown to have good performance for the problem of large-alphabet learning. From Corollary 1, we conclude that, even if the learningbased storage policy uses the best possible large-alphabet estimator, it cannot match the performance of the MYOPIC policy. Therefore, in the high-dimensional setting we consider, separating the task of estimation of the demand statistics, and the subsequent use of the same to design a static storage policy, is strictly suboptimal. This is the key message of this paper. Theorem 2 characterizes the performance of MYOPIC under the Block Change Model, where the statistics of the arrival process do not change in interval of interest. To gain further insight into robustness of MYOPIC against changes in the arrival process, we now analyze the performance of MYOPIC when the arrival process can change in the interval of interest according to the Continuous Change Model defined in Section 2.4. Recall that under the Continuous Change Model, on average, we expect Θ(n) shuffles in the popularity of various content-types in an interval of constant duration. For the Block Change Model, if T (n) = Θ(1), the entire popularity distribution can change at the end of the block, which is equivalent to n shuffles. Therefore, for both the change models, the expected number of changes to the popularity distribution in an interval of constant duration is of the same order. However, these changes occur constantly but slowly in the Continuous Change Model as opposed to a one-shot change in the Block Change Model. Theorem 3. Under Assumption 1, and the Continuous Change Model defined in Section 2.4, the number of requests deferred by the MYOPIC storage policy in any interval of constant duration is O(n1/β ) w.h.p. In view of Theorem 2, if the arrival rates do not vary in an interval of constant duration, under the MYOPIC storage policy, the number of requests deferred in that interval is O(n1/β ) w.h.p. Theorem 3 implies that the number of requests deferred in a constant duration interval is of the same order even if the arrival rates change according to the Continuous Change Model. This shows that the performance of the MYOPIC policy is robust to changes in the popularity statistics. 3.3 Genie-Aided Optimal Storage Policy In this section, our objective is to study the setting where the content-type statistics is available “for free”. We consider the setting where the popularity statistics are known, and show that a simple adaptive policy is optimal in the class of all policies which know popularity statistics of various content-types. We denote the class of such policies as A and refer to the optimal policy as the GENIE policy. Let the content-types be indexed from i = 1 to m and let Ci be the ith content-type. Without loss of generality, we assume that the content-types are indexed in the order of popularity, i.e, λi ≥ λi+1 for all i ≥ 1. Let k(t) denote the number of idle servers at time t. The key idea of the GENIE storage policy is to ensure that at any time t, if the number of idle servers is k(t), the k(t) most popular content-types are stored on exactly one idle server each. The GENIE storage policy can be implemented as follows. Recall Ci is the ith most popular content-type. At time t, - If there is a request for content-type Ci with i < k(t− ), then allocate the request to the corresponding idle server. Further, replace the content-type on server storing Ck(t− ) with content-type Ci . - If there is a request for content-type Ci with i > k(t− ), defer this request. There is no storage update. - If there is a request for content-type Ci with i = k(t− ), then allocate the request to the corresponding idle server. There is no storage update. - If a server becomes idle (due to a departure), replace its content-type with Ck(t− )+1 . For a formal definition, please refer to Figure 3. 1: Initialize: Number of idle-servers := k = n. 2: while true do 3: if new request (for Ci ) routed to a server, then 4: if i 6= k, then 5: replace content-type of idle server storing Ck with Ci 6: end if 7: k ←k−1 8: end if 9: if departure, then 10: replace content-type of new idle server with Ck+1 11: k ←k+1 12: end if 13: end while Figure 3: GENIE – An adaptive storage policy which has content popularity statistics available for “free”. At time t, if the number of idle servers is k(t), the k(t) most popular content-types are stored on exactly one idle server each. Remark 2. Some key properties of GENIE are: 1. The implementation of GENIE requires replacing the content-type of at most one server on each arrival and departure. 2. The GENIE storage policy only requires the knowledge of the relative popularity of various content types. To characterize the performance of GENIE, we assume that the system starts from the empty state (all servers are idle) at time t = 0. The performance metric for any policy A is D(A) (t), defined as the number of requests deferred by time t under the adaptive storage policy A. We say that an adaptive storage policy O is optimal if D(O) (t) ≤st D(A) (t), (1) for any storage policy A ∈ A and any time t ≥ 0. Where Equation 1 implies that, P(D(O) (t) > x) ≤ P(D(A) (t) > x), for all x ≥ 0 and t ≥ 0. Theorem 4. If the arrival process to the content-type delivery system is Poisson and the service times are exponential random variables with mean 1, for the Block Change Model defined in Section 2.4, let D(A) (t) be the number of requests deferred by time t under the adaptive storage policy A ∈ A. Then, we have that, D(GEN IE) (t) ≤st D(A) (t), for any storage policy A ∈ A and any time t ≥ 0. - The “Good Turing + Static Storage” policy uses the Good-Turing estimator [13] to compute an estimate of the missing mass at the end of the learning phase. The missing mass is defined as total probability mass of the content types that were not requested in the learning phase. Recall that we assume that learning-based static storage policies treat all the missing contenttypes equally, i.e., all missing content-types are estimated to be equally popular. Let M0 be the total probability mass of the content types that were not requested in the learning phase and S1 be the set of content types which were requested exactly once in the learning phase. The Good-Turing c0 ) is given by estimator of the missing mass (M c0 = M |S1 | . number of samples See [13] for details. Let Ni be the number of times content i was requested in the learning phase and Cmissing be the set of contenttypes not requested in the learning phase. The “Good Turing + Static Storage” policy computes an estimate of the content-popularity as follows: i: If Ni = 0, pi = c0 M . |Cmissing | c0 ) ii: If Ni > 0, pi = (1 − M Note that this theorem holds even if the λi s are not distributed according to the Zipf distribution. We thus conclude that GENIE is the optimal storage policy in the class of all storage policies which at time t, have no additional knowledge of the future arrivals except the values of λi for all content-types and the arrivals and departures in [0, t). Next, we compute a lower bound on the performance of GENIE. Theorem 5. Under Assumption 1, for β > 2, the Block Change Model defined in Section 2.4 and if the interval of interest is of constant length, the expected number of requests deferred by GENIE is Ω(n2−β ). From Theorems 2 and 5 we see that there is a gap in the performance of the MYOPIC policy and the GENIE policy (which has additional knowldge of the content-type popularity statistics). Since for the GENIE policy, learning the statistics of the arrival process comes for “free”, this gap provides an upper bound on the cost of serving contenttype with unknown demands. We compare the performance of the all the policies considered so far in the next section via simulations. 4. SIMULATION RESULTS We compare the performance of the MYOPIC policy with the performance of the GENIE policy and the following two learning-based static storage policies: - The “Empirical + Static Storage” policy uses the empirical popularity statistics of content types in the learning phase as estimates of the the true popularity statistics. At the end of the learning phase, the number of servers on which a content is stored is proportional to its estimated popularity. Ni . number of samples At the end of the learning phase, the number of servers on which a content is stored is proportional to its estimated popularity. We simulate the content distribution system for arrival and service process which satisfy Assumption 1 to compare the performance of the four policies mentioned above and also understand how their performance depends on various parameters like system size (n), load (λ̄) and Zipf parameter (β). In Tables 1, 2 and 3, we report the mean and variance of the fraction of jobs served by the policies over a duration of 5 s (T (n) = 5) for α = 1. For each set of system parameters, we repeat the simulations between 1000 to 10000 times for each policy in order to ensure that the standard deviation of the quantity of interest (fraction of jobs served) is small and comparable. For the two adaptive policies (GENIE and MYOPIC), the results are averaged over 1000 iterations and for the learning-based policies (“Empirical + Static Storage” and “Good-Turing + Static Storage”), the results are averaged over 10000 iterations. In addition, the results for the learning-based policies are reported for empirically optimized values for the fraction of time spent by the policy in learning the distribution. In Table 1, we compare the performance of the policies for different values of system size (n). For the results reported in Table 1, the “Empirical + Static Storage” policy learns for 0.1 s and the “Good Turing + Static Storage” policy learns for 0.7 s. The performance of all four policies improves as the system size increases and the adaptive policies significantly outperform the two learning-based static storage policies. Figure 4 is a plot of the mean values reported in Table 1. In Table 2, we compare the performance of the policies for different values of Zipf parameter β. For the results reported in Table 2, the duration of the learning phase for MYOPIC Empirical + Static Storage Good Turing + Static Storage n 200 400 600 800 1000 200 400 600 600 1000 200 400 600 800 1000 200 400 600 800 1000 Mean 0.9577 0.9698 0.9752 0.9788 0.9814 0.8995 0.9260 0.9380 0.9481 0.9532 0.6292 0.6918 0.7246 0.7464 0.7622 0.6875 0.7249 0.7443 0.7566 0.7651 σ 0.0081 0.0045 0.0034 0.0030 0.0025 0.0258 0.0167 0.0132 0.0101 0.0080 0.0662 0.0443 0.0353 0.0304 0.0268 0.0274 0.0180 0.0140 0.0118 0.0104 Table 1: The performance of the four policies as a function of the system size (n) for fixed values of load λ̄ = 0.8 and β = 1.5. The values reported are the mean and standard deviation (σ) of the fraction of jobs served. Both adaptive policies (GENIE and MYOPIC) significantly outperform the two learning-based static storage policies. both learning based policies is fixed such that the expected number of arrivals in that duration is 100. The performance of all four policies improves as the value of the Zipf parameter β increases, however, the MYOPIC policy outperforms both learning-based static storage policies for all values of β considered. In Table 3, we compare the performance of the policies for different values of load λ̄. For the results reported in Table 3, the duration of the learning phase for both learning based policies is fixed such that the expected number of arrivals in that duration is 100. The performance of all four policies deteriorates as the load increases, however, for all loads considered, the MYOPIC policies outperforms the two learning-based static storage policies. 5. PROOFS OF MAIN RESULTS 1 0.95 Fraction of Jobs Served Policy GENIE 0.85 0.8 0.75 0.7 GENIE MYOPIC Good Turing + Static Storage Empirical + Static Storage 0.65 200 1. If the learning phase lasts for the first nγ arrivals for some 0 < γ ≤ 1, we show that under Assumption 1, w.h.p., in the learning phase, there are no arrivals for 1 content-types k, k+1, ..m = αn for k = (nγ log n) β−1 + 1 (Lemma 1). 2. Next, we show that w.h.p., among the first nγ arrivals, i.e., during the learning phase, Ω(nγ ) requests are deferred (Lemma 3). 400 500 600 700 800 900 1000 Figure 4: Plot of the mean values reported in Table 1 – performance of the storage policies as a function of system size (n) for λ̄ = 0.8 and β = 1.5. 3. Using Lemma 1, we compute a lower bound on the number of requests deferred in Phase 2 (after the learning phase) by any learning-based static storage policy (Lemma 4). 4. Using Steps 2 and 3, we lower bound the number of requests deferred in the interval of interest. In the case when the learning phase lasts for more than n arrivals, we show that the number of requests deferred in the learning phase alone is Ω(n), thus proving the theorem for this case. Lemma 1. Let E1 be the event that in the first nγ arrivals for 0 < γ ≤ 1, there are no arrivals of types k, k + 1, ...m = 1 αn where k = (nγ log n) β−1 + 1. Then, 1 P(E1 ) ≥ exp − , (2) 2 log n for n large enough. Proof. Recall λi = λ̄npi where pi = i−β Z(β) for Z(β) = Pm −β . i=1 i Z(β) = Proof of Theorem 1 We first present an outline of the proof of Theorem 1. We consider two cases. We first focus on the case when the learning-based storage policies use fewer than n arrivals to learn the distribution. 300 n In this section, we provide the proofs of our main results. 5.1 0.9 αn X i−β Z ≥ αn+1 i−β di ≥ 1 i=1 0.9 β−1 for n large enough. Therefore, for all i, β − 1 −β i . 0.9 The total mass of all content types i = k, ..m = αn is Z αn αn αn X X β − 1 −β β − 1 −β 1 1 . pi ≤ i ≤ i di ≤ 0.9 0.9 0.9 (k − 1)β−1 k−1 pi ≤ i=k i=k 1 For k = (nγ log n) β−1 + 1, we have that, nγ 1 1 1 P(E1 ) ≥ 1 − ≥ exp − , 0.9 (k − 1)β−1 2 log n for n large enough. Policy GENIE MYOPIC Empirical + Static Storage Good Turing + Static Storage β 2.0 3.5 5.0 2.0 3.5 5.0 2.0 3.5 5.0 2.0 3.5 5.0 Mean 0.9940 0.9998 0.9997 0.9774 0.9981 0.9990 0.8592 0.9328 0.9458 0.8448 0.9314 0.9457 σ 0.0025 0.0013 0.0017 0.0063 0.0024 0.0014 0.0206 0.0138 0.0092 0.0231 0.0139 0.0091 Table 2: The performance of the four policies as a function of the Zipf parameter (β) for fixed values of system size n = 500 and load λ̄ = 0.9. The values reported are the mean and standard deviation (σ) of the fraction of jobs served. The MYOPIC policy outperforms the two learning-based static storage policies for all values of β considered. Policy GENIE MYOPIC Empirical + Static Storage Good Turing + Static Storage λ̄ 0.500 0.725 0.950 0.500 0.725 0.950 0.500 0.725 0.950 0.500 0.725 0.950 Mean 0.9892 0.9788 0.9531 0.9605 0.9484 0.8973 0.7756 0.7705 0.7352 0.7849 0.7589 0.6869 σ 0.0025 0.0013 0.0017 0.0113 0.0105 0.0221 0.0222 0.0238 0.0235 0.0230 0.0249 0.0348 Table 3: The performance of the four policies as a function of the load (λ̄) for fixed values of system size n = 500 and β = 1.2. The values reported are the mean and standard deviation (σ) of the fraction of jobs served. The MYOPIC policy significantly outperforms the two learning-based static storage policies for all loads considered. We use the following concentration result for Exponential random variables. Lemma 2. Let Xk for 0 ≤ k ≤ v, be i.i.d. exponential random variables with mean 1, then, X v v a . (3) P Xi ≤ a ≤ exp(v − a) v k=1 The proof of Lemma 2 follows from elementary calculations. Lemma 3. Suppose the system starts with each content piece stored on exactly one server. Let E2 be the event that in the first nγ arrivals for γ such that 0 < γ ≤ 1, at 1 most ((nγ log n) β−1 +1)(log n+1) are served (not deferred). Then, for β > 2, 1 P(E2 ) ≥ 1 − . 2 log n (4) Proof. This proof is conditioned on the event E1 defined in Lemma 1. Conditioned on E1 , in the first nγ arrivals, at 1 most (nγ log n) β−1 +1 different content types are requested. 1 Therefore, at most (nγ log n) β−1 + 1 servers can serve reγ quests during the first n arrivals. Let E3 be the eventγ that the time taken for the first nγ . Since the expected time for the first arrivals is less than 2n λ̄n γ nγ arrivals is n , by the Chernoff bound, P(E3 ) ≥ 1−o(1/n). λ̄n The rest of this proof is conditioned on the event E3 . If the system serves (does not defer) more than 1 ((nγ log n) β−1 + 1)(log n + 1) requests in this interval, at least one server needs to serve more than log n requests. By substituting a = cn−1+γ and v = log n in Lemma 2, we have that, P log Xn Xk ≤ cn−1+γ ≤ exp(log n − cn−1+γ ) k=1 × cn−1+γ log n log n =o 1 . n Therefore, the probability that aγ server serves more than time is o n1 . Therefore, log n requests in an interval of 2n λ̄n using the union bound, the probability that none of these 1 (nγ log n) β−1 +1 servers serve more than log n requests each 1 γ time is greater than 1 − ((nγ log n) β−1 + 1)o( n1 ). in 2n λ̄n Therefore, we have that, 1 1 P(E2c ) ≤ ((nγ log n) β−1 + 1)o + P (E1c ) + P (E3c ) n 1 ≤ 2 log n for n large enough. Lemma 4. Let the interval of interest be T (n) such that T (n) = Ω(1). If the learning phase of the storage policy lasts for the first nγ arrivals, 0 < γ ≤1, the expected number of T (n)n1−γ requests deferred in Phase 2 is Ω . log n Proof. Let N2 be the number of arrivals in Phase 2, then we have that, E[N2 ] = T (n)λ̄n − nγ . Let E4 be the event that N2 > E[N2 ]/2. Using the Chernoff bound, it can be shown that P (E4c ) = o(1/n). The rest of this proof is conditioned on E1 defined in Lemma 1 and E4 defined above. We consider the following two cases depending on the number of servers allocated to content types not seen in Phase 1. Case I: The number of servers allocated to content types not λ̄ . For seen in Phase 1 is less than n for some ≤ 1 − 1000 β > 2, Z(β) = αn X i−β ≤ i=1 Therefore, for all i, pi ≥ types k, k + 1, ..αn is αn X i=k pi ≥ αn X i−2 ≤ i=1 6 −β i . π2 ∞ X i=1 i−2 = π2 . 6 The total mass of all content Z αn+1 αn X 6 −β 6 −β 0.54 1 i ≥ i di = 2 , 2 β−1 π2 π π (β − 1) k k i=k for n large enough. Therefore, the expected number of ar1 rivals of types k, k+1, ..m = αn, where k = (nγ log n) β−1 +1 γ 1 in Phase 2 is at least ( T (n)λ̄n−n ) π20.54 . 2 (β−1) nγ log n Let E5 be the event that in Phase 2, there are at least γ 1 ) 2π20.54 arrivals of type k, k + 1, ..m = ( T (n)λ̄n−n 2 (β−1) nγ log n 1 αn, where k = (nγ log n) β−1 +1. Using the Chernoff bound, P(E5c ) = o(1/n). Conditioned on E1 , all content types k, k + 1, ..m = αn, 1 where k = (nγ log n) β−1 + 1, are not requested in Phase 1. Recall that all learning-based policies treat all these content types equally and that the total number of servers allocated to store the content types not seen in Phase 1 is less than n. 1 Let η be the probability that Ck for k ≥ (nγ log n) β−1 + 1 is not stored by the storage policy under consideration. Then, n η ≥1− ≥1− , 1 2 γ β−1 n − (n log n) −1 for n large enough. Let E6 = E1 ∩ E3 ∩ E4 ∩ E5 and D2 be the number of requests deferred in Phase 2. T (n)λ̄n − nγ 0.54 1 E[D2 |E6 ] ≥ η 2 2π 2 (β − 1) nγ log n T (n)λ̄n − nγ 0.54 1 ≥ 1− 2 2 2π 2 (β − 1) nγ log n T (n)n1−γ . = Ω log n Therefore, E[D2 ] ≥ E[D2 |E6 ]P(E6 ) T (n)n1−γ 1 3 ≥ E[D2 |E6 ] 1 − − =Ω . log n n log n Case II: The number of servers allocated to content types λ̄ not seen in Phase 1 is more than n for some > 1 − 1000 . Let f (n) be the number of servers allocated to store all content types that are requested in Phase 1. By our assumpλ̄ n. tion, f (n) ≤ 1000 Let C1 P be the set of content types requested in Phase 1. Let p = c∈C1 pc be the total mass of all content types c ∈ C1 . Let p̂c be the fraction of requests for content-type c in Phase 1. By the definition of C1 , the totalP empirical mass of all content types c ∈ C1 is obviously p̂ = c∈C1 p̂c = 1. Recall that there are nγ arrivals in Phase 1. Let r = nγ . We now use the Chernoff bound to compute a lower bound on the true mass p, using a technique similar to that used in [13] (Lemma 4). By the Chernoff bound, we know that, prκ2 P(p̂ > (1 + κ)p) ≤ exp − . 3 prκ2 Let δ = exp − , then, we have that, with probability 3 greater than 1 − δ, r −3p log δ p̂ − p > . r Solving for p, we get that, with probability greater than 3 log(1/δ) 1 − δ, p > 1 − , for n large enough. Let δ = 1/n, 2r then we have that, with probability greater than 1 − 1/n, 3 log n p > 1− . Conditioned on the event E4 , there are at 2nγ T (n)λ̄n−nγ least arrivals in Phase 2. The remainder of this 2 proof is conditioned on E4 . Let A2 be the number of arrivals of types c ∈ C1 in phase 2. Let E7 be the event that T (n)λ̄n − nγ 3 log n A2 > 1− . 2 2nγ Since the expected number of arrivals of content types c ∈ C1 in Phase 2 is at least 3 log n γ (T (n)λ̄n − n ) 1 − , 2nγ using the Chernoff bound, we can show that P(E7c ) = o(1/n). The rest of this proof is conditioned on E7 . By our assumption, the number of servers which can serve arrivals of types c ∈ C1 in Phase 2 is f (n). Therefore, if at least A2 /2 requests are to be served in Phase 2, the sum of the service times of these A2 /2 requests should be less than T (n)f (n) (since the number of servers which can serve these requests is f (n)). Let E8 be the event that the sum of A2 /2 independent Exponential random variables with mean 1 is less than T (n)f (n). By substituting v = A2 /2 and a = T (n)f (n) in Lemma 2, we have that, A2 2T (n)f (n) 2 A2 − T (n) P(E8 ) ≤ exp 2 A2 A2 2T (n)f (n) 2 A2 1 ≤ exp =o 2 A2 n for n large enough. Hence, A2 1 P D2 ≥ ≥ 1 − P(E1c ) − o 2 n T (n)n1−γ . ⇒ E[D2 ] = Ω log n Proof. (of Theorem 1) We consider two cases: Case I: The learning phase lasts for the first nγ arrivals where 0 ≤ γ ≤ 1. Let D1 be the number of requests deferred in Phase 1 and D be total number of requests deferred in the interval of interest. Then, we have that, E[D] = E[D1 ] + E[D2 ]. By Lemmas 3 and 4 and since T (n) = Ω(1), we have that, 1 E[D] ≥ nγ − (nγ log n) β−1 log n + E[D2 ] = Ω(n0.5 ). Case II: The learning phase lasts for longer than the time taken for the first n arrivals. By Lemma 3, the number of requests deferred in the first 1 n arrivals is at least n − (n log n) β−1 log n with probability greater than 1 − 1/ log n. Therefore, we have that, 1 1 1− = Ω(n0.5 ). E[D] ≥ n − (n log n) β−1 log n log n 5.2 Proof of Theorem 2 We first present an outline the proof of Theorem 2. 1. We first show that under Assumption 1, on every arrival in the interval of interest (T (n)), there are Θ(n) idle servers w.h.p. (Lemma 6). 2. Next, we show that w.h.p., in the interval of interest of 1 length T (n), only O (nT (n) β ) unique content types are requested (Lemma 7). 3. Conditioned on Steps 1 and 2, we show that, the MYOPIC policy ensures that in the interval of interest, once a content type is requested for the first time, there is always at least one idle server which can serve an incoming request for that content. 4. Using Step 3, we conclude that, in the interval of interest, only the first request for a particular content type will be deferred. The proof of Theorem 2 then follows from Step 2. Lemma 5. Let the cumulative arrival process to the content delivery system be a Poisson process with rate λ̄n. At time t, let χ(t) be the number of occupied servers under the MYOPIC storage policy. Then, we have that, χ(t) ≤st S(t), where S(t) is a poisson random variable with rate λ̄n(1 − e−t ). Proof. Consider an M/M/∞ queue where the arrival process is Poisson(λ̄n). Let S(t) be the number of occupied servers at time t in this system. It is well known that S(t) is a Poisson random variable with rate λ̄n(1 − e−t ). Here we provide a proof of this result for completeness. Consider a request r∗ which arrived into the system at time t0 < t. If the request is still being served by a server, we have that, t0 + µ(r∗ ) > t, where µ(r∗ ) is the service time of request r∗ . Since µ(r∗ ) ∼ Exp(1), we have that, P(µ(r∗ ) > t − = e−(t−t0 ) . Therefore, P(r∗ in the system at time t) ≤ Rt0t|t10 )−(t−t 0) e dt0 . 0 t To show χ(t) ≤st S(t), we use a coupled construction similar to Figure 5. The intuition behind the proof is the following: the rate of arrivals to the content delivery system and the M/M/∞ system (where each server can serve all types of requests) is the same. The content delivery system serves fewer requests than the M/M/∞ system because some requests are deferred even when the servers are idle. Hence, the number of busy servers is the content delivery system is stochastically dominated by the number of busy servers in the M/M/∞ queueing system. Lemma 6. Let the interval of interest be [t0 , t0 + T (n)] where T (n) = o(nβ−1 ) and ε ≤ 1−2 λ̄ . Let F1 be the event that at the instant of each arrival in the interval of interest, the number of idle servers in the system is at least 1 − λ̄ − ε n. Then, P(F1c ) = o n1 . Proof. Let F2 be the event that the number of arrivals in [t0 , t0 + T (n)] ≤ nT (n)(λ̄ + ε). Using the Chernoff bound for the Poisson process, we have that, 1 P(F2c ) = o . n Consider any t ∈ [t0 , t0 + T (n)]. By Lemma 5, χ(t) ≤st S(t), where S(t) ∼ Poisson(λ̄n(1 − e−t )). Therefore, P(χ(t) > (λ̄ + ε)n) ≤ P(S(t) > (λ̄ + ε)n). Moreover, S(t) ≤st W (t) where W (t) = Poisson(λ̄n). Therefore, using the Chernoff bound for W (t), we have that, P(S(t) > (λ̄ + ε)n) ≤ P(W (t) > (λ̄ + ε)n) = e−c1 n , for some constant c1 > 0. Therefore, P(F1c ) ≤ P(F2c ) + (λ̄ + ε)nT (n)P(χ(t) > (λ̄ + ε)n) 1 = o . n Lemma 7. Let F3 be the event that in the interval of interest of duration T (n) such that T (n) = o(nβ−1 ), no more than O((nT (n))1/β )different types of contents are requested. Then, P(F3c ) = o n1 . Proof. Recall from the proof of Lemma 1 that the total mass of all content types k, ..m = αn is αn X pi i=k ≤ 1 1 . 0.9 (k − 1)β−1 Now, for k = (nT (n))1/β + 1, we have that, αn X i=k pi ≤ 1 − β−1 (nT (n)) β . 0.9 Conditioned on the event F2 defined in Lemma 6, the expected number of requests for content types k, k + 1, ..αn is 1 (λ̄ + ε)(nT (n))1/β . Using the Chernoff bound, less than 0.9 2 the probability that there are more than 0.9 (λ̄+ε)(nT (n))1/β requests for content types k, k + 1, ..αn in the interval of interest is less than n12 for n large enough. Therefore, with probability greater than 1 − 1/n2 − P(F2c ), the number different types of contents requests for in the in2 (λ̄+ε)(nT (n))1/β . terval of interest is less than (nT (n))1/β + 0.9 Hence the result follows. Proof. (of Theorem 2) Let F4 be the event that, in the interval of interest, every request for a particular content type except the first request is not deferred. The rest of this proof is conditioned on F1 and F3 . Let U (t) be the number of unique contents which have been requested in the interval of interest before time t for t ∈ [t0 , t0 + T (n)]. Conditioned on F3 , as defined in Lemma 7, U (t) ≤ k1 (nT (n))1/β for some constant k1 > 0 and n large enough. Conditioned on F1 , there are always (1 − λ̄ − ε)n idle servers in the interval of interest. CLAIM: For every i and n large enough, once a content Ci is requested for the first time in the interval of interest, the MYOPIC policy ensures that there is always at least 1 idle server which can serve a request for Ci . Note that since T (n) = o(nβ−1 ), (nT (n))1/β = o(n). Let n be large enough such that k1 (nT (n))1/β < (1 − λ̄ − ε)n, i.e., at any time t ∈ [t0 , t0 +T (n)], the number of idle servers is greater than U (t). We prove the claim by induction. Let the claim hold for time t− and let there be a request at time t for content Ci . If this is not the first request for Ci in [t0 , t0 + T (n)], by the claim, at t = t− , there is at least one idle server which can serve this request. In addition, if there is exactly one server which can serve Ci at t− , then the MYOPIC policy replaces the content of some other idle server with Ci . Since there are more than k1 (nT (n))1/β idle servers and U (t) < k1 (nT (n))1/β , at t+ , each content type requested in the interval of interest so far, is stored on at least one currently idle server. Therefore, conditioned on F1 and F3 , every request for a particular content type except the first request, is not deferred. 1 n1/β different contents are among the top n1/β most popular contents. Given this, the proof follows the same lines of arguments as in the proof of Lemma 7. Hence, putting everything together, 5.4 P(F4 ) ≥ 1 − P(F1c ) − P(F3c ), thus P(F4 ) → 1 as n → ∞ and the result follows. Proof. (of Corollary 1) From Theorem 2 we have that, w.h.p., the number of requests deferred by the MYOPIC storage policy is O(nT (n))1/β o(n0.5 ) and by Theorem 1, we know that, the expected number of requests deferred by any learning-based storage policy is Ω(n0.5 ). 5.3 Proof of Theorem 3 In this section, we first present an outline of the proof of Theorem 3 followed by the proof details. 1. Since we are studying the performance of the MYOPIC policy for the Continuous Change Model, the relative order of popularity of contents keeps changing in the interval of interest. We show that w.h.p., the number of content types which are in the n1/β most popular content types at least once in the interval of interest is O(n1/β ) (Lemma 8). 2. Next, we show that w.h.p., in the interval of interest of length b, only O(n1/β ) content types are requested (Lemma 9). 3. By Lemma 6 and the proof of Theorem 2, we know that, conditioned on Step 3, the MYOPIC storage policy ensures that in the interval of interest, once a content type is requested for the first time, there is always at least one idle server which can serve an incoming request for that content. Using this, we conclude that, in the interval of interest, only the first request for a particular content type will be deferred. The proof of Theorem 3 then follows from Step 2. Lemma 8. Let G1 be the event that, in the interval of interest of length b, the number of times that a content among the current top n1/β most popular contents changes its position in the popularity ranking is at most 4b n1/β ν. Then, α 1 P (G1 ) ≥ 1 − o n . Proof. The expected number of clock ticks in b timeunits is bnν. The probability that a change in arrival process involves at least one of the current n1/β most popular con1/β tents is nαn . Therefore, the expected number of changes in arrival process which involve at least one of the current n1/β n1/β most popular contents is 2bν α . By the Chernoff bound, we have that P (G1 ) ≥ 1 − o n1 . Proof. (of Theorem 3) The proof of the theorem 3 follows from Lemma 9 and uses the same line of arguments as in the proof of Theorem 2. Proof of Theorem 4 To show that GENIE is the optimal policy, we consider the process X(t) which is the number of occupied servers at time t when the storage policy is GENIE. Let Y (t) be the number of occupied servers at time t for some other storage policy A ∈ A. We construct a coupled process (X ∗ (t), Y ∗ (t)) such ∗ ∗ = that the marginal rates of change in X (t) and Y (t) is the same as that of X(t) and Y (t) respectively. Pm i=1 λi . At time t, let CGENIE (t) and CA (t) Recall λ̄ = n be the sets of contents stored on idle servers by GENIE and A respectively. The construction of the coupled process (X ∗ (t), Y ∗ (t)) is described in Figure 5. We assume that the system starts at time t = 0 and X ∗ (0) = Y ∗ (0) = 0. In this construction, we maintain two counters ZX ∗ and ZY ∗ which keep track of the number of departures from the system. Let ZX ∗ (0) = ZY ∗ (0) = 0. Let Exp(µ) be an Exponential random variable with mean µ1 and Ber(p) be a Bernoulli random variable which is 1 with probability (w.p.) p. Lemma 10. X ∗ (t) and Y ∗ (t) have the same marginal rates of transition as X(t) and Y (t) respectively. Proof. This follows from the properties of the coupled process described in Figure 5 using standard arguments. Lemma 11. Let D(GEN IE) (t) be the number of jobs deferred by time t by the GENIE adaptive storage policy and D(A) (t) to be the number of jobs deferred by time t by a policy A ∈ A. In the coupled construction, let W ∗ (t) be ∗ the number of arrivals by time t. Let, DX (t) = W ∗ (t) − ∗ ∗ Y∗ Z(X ∗ ) (t) − X (t) and D (t) = W (t) − Z(Y ∗ ) (t) − Y ∗ (t). ∗ ∗ Then, DX (t) and DY (t) have the same marginal rates of (GEN IE) transition as D (t) and D(A) (t) respectively. Proof. This follows from Lemma 10 due to the fact that X(t) have the same distribution as X ∗ (t) and the marginal ∗ rate of increase of DX (t) given X ∗ (t) is the same as the rate of increase of D(GEN IE) (t) given X(t). The result for ∗ DY (t) follows by the same argument. Lemma 12. X ∗ ≥ Y ∗ for all t on every sample path. Proof. The proof follows by induction. X ∗ (0) = Y ∗ (0) ∗ − by construction. Let X ∗ (t− 0 ) ≥ Y (t0 ) and let there be an arrival or departure at time t0 . There are 4 possible cases: ∗ − ∗ ∗ − i: If ARR<DEP and X ∗ (t− 0 ) = Y (t0 ), Y (t0 ) = Y (t0 )+ ∗ ∗ − ∗ 1 only if X (t0 ) = X (t0 ) + 1. Therefore, X (t0 ) ≥ Y ∗ (t0 ). ∗ − ∗ ∗ − ii: If ARR<DEP and X ∗ (t− 0 ) > Y (t0 ), Y (t0 ) ≤ Y (t0 )+ ∗ ∗ ∗ 1 ≤ X ∗ (t− ) ≤ X (t ). Therefore, X (t ) ≥ Y (t 0 0 0 ). 0 Lemma 9. Let G2 be the event that in the interval of interest, no more than O(n1/β ) different types of contents are requested. Then, P(Gc2 ) = o n1 . ∗ − ∗ ∗ iii: If DEP<ARR and X ∗ (t− 0 ) = Y (t0 ), X (t0 ) = Y (t0 ). Proof. Conditioned on the event G1 defined in Lemma 8, we have that in the interval of interest, at most 2b ν+ α ∗ ∗ − ∗ − iv: If DEP<ARR and X ∗ (t− 0 ) > Y (t0 ), X (t0 ) = X (t0 )− ∗ − ∗ ∗ ∗ 1 ≥ Y (t0 ) ≥ Y (t0 ). Therefore, X (t0 ) ≥ Y (t0 ). 1: Generate: ARR ∼ Exp(nλ̄), DEP ∼ Exp(max{X ∗ , Y ∗ }) 2: t = t + min{ARR,DEP} 3: if ARR<DEP, then 4: if (X ∗ = Y ∗ ) then P i∈CGENIE (t) λi 5: Generate u1 ∼ Ber nλ̄ 6: if (u1 = 1) then 7: X∗ ← X∗ + 1 P i∈CA (t) λi 8: Generate u2 ∼ Ber P i∈CGENIE (t) λi 9: if (u2 = 1) then Y ∗ ← Y ∗ + 1 10: end if 11: else P i∈CGENIE (t) λi 12: Generate u1 ∼ Ber nλ̄ 13: if (u1 = 1) then X ∗ ← P X∗ + 1 i∈CA (t) λi 14: Generate u2 ∼ Ber P i∈CGENIE (t) λi 15: if (u2 = 1) then Y ∗ ← Y ∗ + 1 16: end if 17: else 18: if (X ∗ ≥ Y ∗ ) then 19: X ∗ ← X ∗ − 1, ZX ∗← Z X∗ + 1 Y∗ 20: Generate u3 ∼ Ber X∗ 21: if (u3 = 1) then Y ∗ ← Y ∗ − 1, ZY ∗ ← ZY ∗ + 1 22: else 23: Y ∗ ← Y ∗ − 1, ZY ∗ ← ZY∗ + 1 X∗ 24: Generate u4 ∼ Ber Y∗ 25: if (u4 = 1) then X ∗ ← X ∗ − 1, ZX ∗ ← ZX ∗ + 1 26: end if 27: end if 28: Goto 1 Figure 5: Coupled Process Lemma 13. ZX ∗ ≥ ZY ∗ for all t on every sample path. Proof. The proof follows by induction. Since the system starts at time t = 0, ZX ∗ (0) = ZY ∗ (0). Let ZX ∗ (t− 0 ) ≥ ZY ∗ (t− 0 ) and let there be a departure at time t0 . By Lemma ∗ − ∗ 12, we know that, X ∗ (t− 0 ) ≥ Y (t0 ). Therefore, ZX (t0 ) ≥ ZY ∗ (t0 ) by the coupling construction. Proof. (of Theorem 4) By Lemmas 12 and 13, for any sample path, X ∗ (t) + ZX ∗ (t) ≥ Y ∗ (t) + ZY ∗ (t). Therefore, for every sample path, the number of requests already served (not deferred) or being served by the servers by a content delivery system implementing the GENIE policy is more than that by any other storage policy. This implies that for each sample path, the number of requests deferred by GENIE is less than that of any other storage policy. Sample path dominance in the coupled system implies stochastic dominance of the original process. Using this and Lemma 11, we have that, D(GEN IE) (t) ≤st D(A) (t). 6. PROOF OF THEOREM 5 Proof. (of Theorem 5) The key idea of the GENIE policy is to ensure that at any time t, if the number of idle servers is k(t), the k(t) most popular contents are stored on exactly one idle server each. Since the total number of servers is n, and the number of content-types is m = αn for some constant α > 1, all content-types Ci for i > n are never stored on idle servers by the GENIE policy. This means that under the GENIE policy, all arrivals for content types Ci for i > n are deferred. For β > 2, for all i, pi ≥ π62 i−β . The cumulative mass of all content types i = n + 1, ..αn is Z αn+1 αn αn X X 6 −β 6 −β i ≥ i di pi ≥ 2 2 π π n+1 i=n+1 i=k ≥ 0.54 1 , π 2 (β − 1) (n + 1)β−1 for n large enough. Let the length of the interval of interest be b. The expected number of arrivals of types n + 1, n + 2, ..αn, in the 0.54bλ̄n 1 interval of interest is at least 2 . Thereπ (β − 1) (n + 1)β−1 fore, the expected number of jobs deferred by the GENIE policy in an interval of length b is Ω(n2−β ). 7. RELATED WORK Our model of content delivery systems shares several features with recent models and analyses for content placement and request scheduling in multi-server queueing systems [10, 11, 15, 18]. All these works either assume known demand statistics, or a low-dimensional regime (thus permiting “easy” learning). Our study is different in its focus on unknown, high-dimensional and time-varying demand statistics, thus making it difficult to consistently estimate statistics. Our setting also shares some aspects of estimating large alphabet distributions with only limited samples, with early contributions from Good and Turing [6], to recent variants of such estimators [13, 16]. Our work is also related to the rich body of work on the content replication strategies in peer-to-peer networks, e.g., [3, 4, 8, 9, 12, 14, 22, 25, 26]. Replication is used in various contexts: [14] utilizes it in a setting with large storage limits, [9, 12] use it to decrease the time taken to locate specific content, [3,25,26] use it to increase bandwidth in the setting of video streaming, and [4] uses it to minimize the number of hosts that need to be probed to resolve a query for a file in unstructured peer-to-peer networks. However, the common assumption is that the number of content-types does not scale with the number of peers, and that a request can be served in parallel by multiple servers (and with increased network bandwidth as the number of peers with a specific content-type increases) which is fundamentally different from our setting. Finally, our work is also related to the vast literature on content replacement algorithms in server/web cache management. As discussed in [19], parameters of the content (e.g., how large is the content, when was it last requested) are used to derive a cost, which in-turn, is used to replace content. Examples of algorithms that have a cost-based interpretation include the Least Recently Used (LRU) policy, the Least Frequently Used (LFU) policy, and the Max-Size policy [20]. We refer to [19] for a survey of web caching schemes. There is a huge amount of work on the performance of replication strategies in single-cache systems; however the analysis of adaptive caching schemes in distributed cache systems under stochastic models of arrivals and departures is very limited. 8. ACKNOWLEDGEMENTS We acknowledge the support of ** grants ***. 9. CONCLUSIONS In this paper, we considered the high dimensional setting where the number of servers, the number of content-types, and the number of requests to be served over any time interval all scale as O(n); further the demand statistics are not known a-priori. This setting is motivated by the enormity of the contents and their time-varying popularity which prevent the consistent estimation of demands. The main message of this paper is that in such settings, separating the estimation of demands and the subsequent use of the estimations to design optimal content placement policies (“learn-and-optimize” approach) is order-wise suboptimal. This is in contrast to the low dimensional setting, where the existence of a constant bound on the number of content-types allows asymptotic optimality of a learn-andoptimize approach. 10. REFERENCES [1] M. Ahmed, S. Traverso, M. Garetto, P. Giaccone, E. Leonardi, and S. Niccolini. Temporal locality in today’s content caching: why it matters and how to model it. ACM SIGCOMM Computer Communication Review, 43(5):5–12, October 2013. [2] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and Zipf-like distributions: Evidence and implications. In IEEE INFOCOM’99, pages 126–134, 1999. [3] D. Ciullo, V. Martina, M. Garetto, E. Leonardi, and G. Torrisi. Stochastic analysis of self-sustainability in peer-assisted VoD systems. In IEEE INFOCOM, pages 1539–1547, 2012. [4] E. Cohen and S. Shenker. Replication strategies in unstructured peer-to-peer networks. In ACM SIGCOMM Computer Communication Review, volume 32, pages 177–190. ACM, 2002. [5] P. Gill, M. Arlitt, Z. Li, and A. Mahanti. YouTube traffic characterization: A view from the edge. In 7th ACM SIGCOMM Conference on Internet Measurement, pages 15–28, 2007. [6] I. J. Good. The population frequencies of species and the estimation of population parameters. Biometrika, 40(3-4):237–264, 1953. [7] A. Iamnitchi, M. Ripeanu, and I. Foster. Small-world file-sharing communities. In IEEE INFOCOM, March 2004. [8] J. Kangasharju, K. Ross, and D. Turner. Optimizing file availability in peer-to-peer content distribution. In INFOCOM, 2007. [9] J. Kangasharjua, J. Roberts, and K. Ross. Object replication strategies in content distribution networks. Computer Communications, 25:376–383, 2002. [10] M. Leconte, M. Lelarge, and L. Massoulie. Bipartite graph structures for efficient balancing of heterogeneous loads. In the 12th ACM SIGMETRICS Conference, pages 41–52, 2012. [11] M. Leconte, M. Lelarge, and L. Massoulie. Adaptive replication in distributed content delivery networks. Preprint, 2013. [12] Q. Lv, P. Cao, E. Cohen, K. Li, and S. Shenker. Search and replication in unstructured peer-to-peer networks. In 16th international conference on Supercomputing, 2002. [13] A. McAllester and R. Schapire. On the convergence rate of Good-Turing estimators. In COLT Conference, pages 1 – 6, 2000. [14] B. Tan and L. Massoulie. Optimal content placement for peer-to-peer video-on-demand systems. IEEE/ACM Trans. Networking, 21:566–579, 2013. [15] J. Tsitsiklis and K. Xu. Queueing system topologies with limited flexibility. In SIGMETRICS ’13, 2013. [16] G. Valiant and P. Valiant. Estimating the unseen: An n/log (n)-sample estimator for entropy and support size, shown optimal via new clts. In Proceedings of the 43rd annual ACM Symposium on Theory of Computing, pages 685–694, 2011. [17] E. Veloso, V. Almeida, W. Meira, A. Bestavros, and S. Jin. A hierarchical characterization of a live streaming media workload. In Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment, pages 117–130, 2002. [18] R. B. Wallace and W. Whitt. A staffing algorithm for call centers with skill-based routing. Manufacturing and Service Operations Management, 7:276–294, 2007. [19] J. Wang. A survey of web caching schemes for the Internet. ACM SIGCOMM Computer Communication Review, 29:36–46, 1999. [20] S. Williams, M. Abrams, C. Standridge, G. Abdulla, and E. Fox. Removal policies in network caches for world-wide web documents. In SIGCOMM’96, 1996. [21] S. Williams, M. Abrams, C. Standridge, G. Abdulla, and E. Fox. Caching proxies: limitations and potentials. In the 4th International WWW Conference, December 1995. [22] W. Wu and J. Lui. Exploring the optimal replication strategy in P2P-VoD systems: Characterization and evaluation. IEEE Transactions on Parallel and Distributed Systems, 23, August 2012. [23] www.youtube.com/yt/press/statistics.html. [24] H. Yu, D. Zheng, B. Zhao, and W. Zheng. Understanding user behavior in large scale video-on-demand systems. In EuroSys, April 2006. [25] X. Zhou and C. Xu. Optimal video replication and placement on a cluster of video-on-demand servers. In International Conference on Parallel Processing, pages 547–555, 2002. [26] Y. Zhou, T. Fu, and D. Chiu. On replication algorithm in P2P-VoD. IEEE/ACM Transactions on Networking, pages 233 – 243, 2013.