Distributed Storage Allocations for Optimal Delay Derek Leong1, Alexandros G. Dimakis2, Tracey Ho1 1California Institute of Technology 2University of Southern California ISIT 2011 2011-08-02 Motivation How should we store a data object over a set of mobile nodes, subject to a given storage budget, so as to achieve the optimal recovery delay? When is coding beneficial? When will uncoded replication suffice? Distributed Storage Allocations for Optimal Delay / 2 Introduction Network Model Consider a network of n mobile storage nodes Assume that the number of contacts between any given pair of nodes follows a Poisson distribution with rate parameter ¸; the time between contacts is therefore described by an exponential distribution with mean 1/¸ Distributed Storage Allocations for Optimal Delay / 3 Introduction Storage Allocation A source node creates a data object of unit size, and subsequently disseminates an encoded representation of it to other nodes for storage, subject to a given total storage budget T At the end of the dissemination process, node 1 stores x1 amount of data, node 2 stores x2 amount of data, and so on, such that Distributed Storage Allocations for Optimal Delay / 4 Introduction Recovery by a Data Collector At some time after the completion of the data dissemination process, a data collector node begins to recover the data object by contacting other nodes and accessing the coded data stored in them We make the simplifying assumption that the stored data is instantaneously transmitted on contact Let random variable D denote the recovery delay incurred by the data collector Distributed Storage Allocations for Optimal Delay / 5 Introduction Objectives We seek an allocation (x1; …; xn) of the given budget T that produces the optimal recovery delay; specifically, we consider two objectives involving the recovery delay D: (i) maximization of the probability of successful recovery by a given deadline d, or recovery probability (ii) minimization of the expected recovery delay Therefore, for each objective, we need to find (i) an optimal allocation of the given budget over the nodes, and (ii) an optimal coding scheme that jointly optimize the objective Distributed Storage Allocations for Optimal Delay / 6 Introduction Objectives Using an appropriate code, successful recovery occurs whenever the data collector accesses at least a unit amount of data (= size of the original data object) s 1 2 t1 3 4 5 t2 A. G. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright, and K. Ramchandran, “Network coding for distributed storage systems,” Trans. Inf. Theory, Sep 2010. A. Jiang, “Network coding for joint storage and transmission with minimum cost,” in Proc. ISIT, Jul 2006. Distributed Storage Allocations for Optimal Delay / 7 Introduction Objectives Therefore, assuming the use of an appropriate code (e.g. random linear coding, MDS code), we can express the recovery delay D as , where is the set of all nodes contacted by the data collector by time d Distributed Storage Allocations for Optimal Delay / 8 Maximizing Recovery Probability Let W1, …, Wn be i.i.d. random variables denoting the times at which the data collector first contacts nodes 1, …, n, respectively, where Wi ~ Exponential(¸) Observe that the data collector contacts each node by the specified deadline d > 0 independently with probability p¸;d given by It follows that the probability of contacting exactly a subset r of the n nodes by time d is ; the recovery probability can therefore be obtained by summing over all subsets r that allow successful recovery: Distributed Storage Allocations for Optimal Delay / 9 Recovery Probability: Related Work RECAP Discussion between R. Karp, R. Kleinberg, C. Papadimitriou, E. Friedman, and others at UC Berkeley, 2005 Found examples for the suboptimality of symmetric allocations Conjectured that there exists a symmetric optimal allocation when the number of nodes n → ∞ S. Jain, M. Demmer, R. Patra, K. Fall, “Using redundancy to cope with failures in a delay tolerant network,” SIGCOMM 2005 Considered the allocation of a transmission budget over different routes in a DTN Experimentally evaluated the performance of symmetric allocations along with other heuristics Related theoretical claims and proofs incomplete/inaccurate Distributed Storage Allocations for Optimal Delay / 10 Recovery Probability: Illustrative Example RECAP n = 5 nodes, access probability p¸;d = 2/3, budget T = 7/3 1 2 A 7/15 3 4 7/15 7/15 7/15 7/15 0.79012 B 7/6 7 /6 0 0 0 0.88889 C 2/3 2 /3 1/3 1/3 1/3 0.90535 Distributed Storage Allocations for Optimal Delay / 11 5 Recovery Probability Recovery Probability: Optimal Symmetric Allocation RECAP We are particularly interested in symmetric allocations because they are easy to describe and implement Successful recovery for the symmetric allocation occurs if and only if at least out of the m nonempty nodes are accessed Therefore, the recovery probability of is given by D. Leong, A. G. Dimakis, and T. Ho, “Symmetric allocations for distributed storage,” in Proc. GLOBECOM, Dec 2010. Distributed Storage Allocations for Optimal Delay / 12 Recovery Probability: Optimal Symmetric Allocation RECAP The problem is nontrivial even when restricted to symmetric allocations… number of nonempty nodes in the symmetric allocation The recovery probability for the symmetric allocation is Distributed Storage Allocations for Optimal Delay / 13 Recovery Probability: Optimal Symmetric Allocation RECAP Maximal spreading (with coding) is optimal among symmetric allocations when the contact rate ¸ or recovery deadline d is sufficiently large: PROPOSITION 1 If , then either optimal symmetric allocation. or is an Minimal spreading (uncoded replication) is optimal among symmetric allocations when the contact rate ¸ or recovery deadline d is sufficiently small: PROPOSITION 2 If , then is an optimal symmetric allocation. D. Leong, A. G. Dimakis, and T. Ho, “Symmetric allocations for distributed storage,” in Proc. GLOBECOM, Dec 2010. Distributed Storage Allocations for Optimal Delay / 14 Recovery Probability: Optimal Symmetric Allocation RECAP other symmetric allocations may be optimal in the gap When exactly, we observe numerically that minimal spreading is optimal among symmetric allocations for most values of T ; the optimal symmetric allocation changes continually over the intervals minimal spreading (uncoded replication) is optimal among symmetric allocations while Distributed Storage Allocations for Optimal Delay / 15 is optimal for maximal spreading (with coding) is optimal among symmetric allocations Minimizing Expected Recovery Delay By considering the derivative of wrt d, we obtain the following expression for the expected recovery delay: CONJECTURE: A symmetric optimal allocation always exists for any n and T Observe that given n, ¸, and T, the optimal allocation depends only on n and T, but not ¸; this is in contrast with the maximization of for which the optimal allocation depends on all parameters n, ¸, T, and d Distributed Storage Allocations for Optimal Delay / 16 Expected Delay: Related Work T. Spyropoulos, K. Psounis, C. S. Raghavendra, “Spray and Wait: An efficient routing scheme for intermittently connected mobile networks,” ACM SIGCOMM Workshop on DTN 2005 Spray a fixed number of uncoded replicas into the network, and wait for one of them to come into contact with the data collector Showed that this fixed budget approach performs very well compared to other heuristics Distributed Storage Allocations for Optimal Delay / 17 Expected Delay: Optimal Symmetric Allocation Finding the optimal symmetric allocation… number of nonempty nodes in the symmetric allocation The expected recovery delay for the symmetric allocation is Distributed Storage Allocations for Optimal Delay / 18 Expected Delay: Optimal Symmetric Allocation We are able to characterize the optimal symmetric allocation completely: RESULT 1 Suppose If allocation. , where , then If , then either optimal symmetric allocation. . is an optimal symmetric or is an If T is an integer (i.e. ` = 1), then , which corresponds to minimal spreading (uncoded replication), is optimal As the fractional part of T increases (i.e. ` increases), the amount of spreading (with coding) in the optimal symmetric allocation increases Distributed Storage Allocations for Optimal Delay / 19 Expected Delay: Optimal Symmetric Allocation Proof Idea: Eliminating candidates for the optimal symmetric allocation… 1. We can show that an optimal m* can be found from among candidates: 2. For , where , the expected recovery delay is given by 3. Using a geometrical argument, we show that the choice of minimizes the expected recovery delay among all where 4. To demonstrate the optimality of , i.e. following bounds for the harmonic number Hn: Distributed Storage Allocations for Optimal Delay / 20 , we apply the , Simulation Study We apply our theoretical insights to the design of a simple data dissemination and storage protocol for a delay tolerant network Simulations allow us to capture the transient dynamics of the data dissemination process, and its interaction with the data recovery process Our goal is to understand how different symmetric allocations perform under different circumstances: Random waypoint mobility model vs real-world mobility traces Low vs high mobility Low vs high connectivity Starting recovery immediately vs after some time Distributed Storage Allocations for Optimal Delay / 21 Simulation Study: Generalized Spray and Wait Our protocol extends SPRAY AND WAIT by allowing nodes to store coded packets that are each 1/w the size of the original data object Successful recovery occurs when the data collector accesses at least w such packets (choosing w = 1 produces the original protocol) Different symmetric allocations of the given budget T can be realized by changing the value of parameter w Distributed Storage Allocations for Optimal Delay / 22 Simulation Study: Random Waypoint: Key Observations Number of wireless mobile nodes n = 100 Plots show how the required wait time the desired recovery probability PS Each line represents a specific choice of parameter Expected recovery delay performance is consistent with our analysis: minimal spreading is optimal in most plots Effect of increased connectivity appears less straightforward, e.g. phase transition not evident for recovery starting at time 0: data dissemination process impeded by greater interference? Distributed Storage Allocations for Optimal Delay / 23 varies with High-mobility scenario plots appear to be vertically scaled versions of the baseline scenario plots: speedingstart up oftime time Recovery appears to have aperformance limited Recovery probability impact on with how different is consistent our analysis: allocations perform phase transition in the optimal relative allocation to each other: symmetric is clearly most noticeable effect of discernable inrecovery most plots In the high starting recovery at time 0 probability is the reducedregime, spread in maximal spreading performance (with coding) can lead to a significant reduction in the required wait time Simulation Study: Mobility Traces: Key Observations Number of wireless taxi cabs n = 100 Plots show how the required wait time the desired recovery probability PS Each line represents a specific choice of parameter Despite nonideal conditions, many of our previous observations still apply here Distributed Storage Allocations for Optimal Delay / 24 In the high recovery probability regime, maximal spreading (with coding) can lead to a significant reduction in the required wait time varies with Plots show distinct “jumps” in wait times: reduced mobility of cabs at night Summary: Theoretical Analysis The optimal symmetric allocations are not the same for both objectives… (i) Maximization of Recovery Probability : For any budget T, there is a phase transition from a regime where minimal spreading (uncoded replication) is optimal to a regime where maximal spreading (with coding) is optimal, as the access probability p (or the deadline d) increases (ii) Minimization of Expected Recovery Delay : With the averaging over both regimes, minimal spreading (uncoded replication) turns out to be optimal whenever the budget T is an integer; the amount of spreading in the optimal symmetric allocation increases with the fractional part of T Performance gap between minimal spreading and maximal spreading can be quite substantial , e.g. for the required wait time in both the low and high recovery probability regimes Distributed Storage Allocations for Optimal Delay / 25 Summary: Simulation Study Results of the simulation study are consistent with our analytical findings Provides clear evidence that the choice of storage allocation can have a significant impact on the recovery delay performance Shows how mobility, connectivity, and recovery start time may affect performance Distributed Storage Allocations for Optimal Delay / 26 Future Work The simple contact model assumed here can be generalized to the case where a variable amount of data is transmitted during each contact between nodes Allow nonuniform contact rates ¸i between the data collector and individual nodes Distributed Storage Allocations for Optimal Delay / 27 Thank you! Distributed Storage Allocations for Optimal Delay / 28 Additional Simulation Results Distributed Storage Allocations for Optimal Delay / 29 Simulation Study: Random Waypoint (Budget T = 5) Number of wireless mobile nodes n = 100 Plots show how the required wait time the desired recovery probability PS Each line represents a specific choice of parameter Distributed Storage Allocations for Optimal Delay / 30 varies with Simulation Study: Random Waypoint (Budget T = 10) Number of wireless mobile nodes n = 100 Plots show how the required wait time the desired recovery probability PS Each line represents a specific choice of parameter Distributed Storage Allocations for Optimal Delay / 31 varies with Simulation Study: Random Waypoint (Budget T = 20) Number of wireless mobile nodes n = 100 Plots show how the required wait time the desired recovery probability PS Each line represents a specific choice of parameter Distributed Storage Allocations for Optimal Delay / 32 varies with Simulation Study: Mobility Traces (Budget T = 5) Number of wireless taxi cabs n = 100 Plots show how the required wait time the desired recovery probability PS Each line represents a specific choice of parameter Distributed Storage Allocations for Optimal Delay / 33 varies with Simulation Study: Mobility Traces (Budget T = 10) Number of wireless taxi cabs n = 100 Plots show how the required wait time the desired recovery probability PS Each line represents a specific choice of parameter Distributed Storage Allocations for Optimal Delay / 34 varies with Simulation Study: Mobility Traces (Budget T = 20) Number of wireless taxi cabs n = 100 Plots show how the required wait time the desired recovery probability PS Each line represents a specific choice of parameter Distributed Storage Allocations for Optimal Delay / 35 varies with