A Stochastic Primal-Dual Algorithm for Joint Flow Control and MAC Design in Multi-hop Wireless Networks Junshan Zhang and Dong Zheng Department of Electrical Engineering Arizona State University March 22, 2006 Multi-hop Wireless Networks Bursty traffic Channel fading Rando m access Stochastic attributes of wireless networks: • • • • • time-varying channel conditions; random access, co-channel interference; burstiness of traffic flows; Mobility, dynamic network topology; … Objective and Approach •Goal: QoS provisioning through joint flow control and MAC design in multi-hop random access networks. •Approach: (stochastic) network utility maximization • Treat rate control as a utility maximization problem • Different layers function “cooperatively” to achieve the optimum point (equilibrium point). Basic Setting ² Consider a wireless network modelled as a directed graph G = (N; E); ² Let U (x ) denote the utility function of °ow s, where x is the °ow rate; s s s ² Consider persistence scheme with xmission prob. fp 8(i; j) 2 E g. ; (i;j) ² Let N I (i) denote the set of nodes whose transmissions interfere with node to i's reception. ² Use N (i) to denote the set of the nodes from which node i receives tra±c, in Nout (i) to denote the set of nodes to which node i is sending packets and NI (i) to denote the set of nodes whose reception is interfered by node f rom i's transmission. Problem Formulation ¥ : max P Us (xs ) Q s2SP · subject to xs c(i;j) (p(i;j) (1 ¡ Pk )); 8 (i; j) s2SP ((i;j)) k2N I (j) to p(i;j) = Pi ; 8 i fx g s j 2Nout (i) 0 · xs · Ms ; 8 s 0 · Pi · 1; 8 i; where Ms is the maximum for °ow data rate of s, c(i;j) is the average transmission rate and the utility function Us (¢) takes the following general form [Mo, Walrand 00] ½ U· (ri ) = wi log ri ; wi (1 ¡ ·)¡1 r 1¡· ; i if · = 1 otherwise: Observation: Problem ¥ is non-convex and non-separable. Related work 1. Internet TCP/AQM: [Kelly, Maulloo, Tan 98], [Low, Lapsley 99], [Srikant 05] and many many more (sorry due to limited space ); 2. Joint flow control/routing/MAC/PHY design: [Chiang 04], [Lin, Shroff 05], [Wang, Kar 05], [Chen, Low, Chiang, Doyle 06], and [Lee, Chiang, Calderbank 06], … ; A common assumption: Most works above assume deterministic feedback in their distributed algorithms. Outline 1. Deterministic primal-dual algorithm; 2. Modelling stochastic perturbation; 3. Convergence of the stochastic primal-dual algorithm; 4. Rate of convergence. Deterministic Version: Problem Transformation Let x ~s = log(xs ) [Lee, Chiang, Calderbank 06], we have P : max fx ~ g s subject to P Us0 (~ xs ) s2S P log( 2S exp(~ xs )) ¡ log(p(i;j) ) ¡ P s ((i;j)) ¡ P ) · 0; 8 (i; j) log(1 k I (j) P k2Nto 8i p = P ; i j 2N out (i) (i;j) ¡1 ·x · 8 ~ ; s ~s M s · · 8 0 Pi 1; i; where Us0 (~ xs ) = Us (exp(~ xs )). Observation: Problem P is convex and separable if · ¸ 1. Lagrange Dual Approach The Lagrangian function is L(~ x; p; ¸) = 8 <X : s Us0 (~ xs ) ¡ X ¸(i;j) 0 log @ X s2S ((i;j)) (i;j) Then, the Lagrange dual function is Q(¸) = P max p(i;j) =Pi 0·P·1 ¡1·x ~ ~·M j 2 Nout (i) and the dual problem is given by D : min Q(¸) ¸¸0 19 0 = X exp(~ xs )A + ¸(i;j) log @p(i;j) ; (i;j) L(~ x; p; ¸); Y (1 ¡ Pk )A k2N I (j) to 1 Strong Duality Proposition: a) There is no duality gap, i.e., the minimum value of the dual problem D is equal to the maximal value of the primal problem P. b) Let © be the set of ¸ that minimizes Q(¸). Then © is non-empty and compact; and 8¸ 2 ©, there exists a unique vector (~ x¤ ; p¤ ) that maximizes the Lagrangian function L(¢; ¢; ¢). Distributed Primal-dual Alg. ² The source rates are updated by 2 3~ Ms 6 0 17 6 7 X 6 7 ¸ ¡ 6 7 @ A (i;j)(n) _ 0 P x ~s (n+1) = 6x ~s (n) + ²n U (~ xs (n)) exp(~ xs (n)) s exp(~ xs (n)) 7 6 7 2S ((i;j)) s 2L 4 (i;j) (s) | {z }5 ,L x ~s (~ x(n);p(n);¸(n)) : ¡1 ² The shadow prices are updated by 2 31 6 0 0 117 6 7 X X 6 7 ¡ ² @log(p ¡ P (n)) ¡ log @ AA7 : ¸(i;j) (n+1) = 6 ¸ (n) (n)) + log(1 exp(~ x (n)) 6 (i;j) 7 n k s (i;j) 6 7 4 s2S ((i;j)) k 2 N I (j) | {z }5 to L¸ (~ x(n);p(n);¸(n)) (i;j) ² The persistence probabilities are updated by p(i;j) (n + 1) = P k 2N out (i) ¸(i;k) (n) + ¸ (n) P (i;j) (l;m):m 2 N I f rom (i);l2 N in (m)(n) ¸(l;m) : 0 Stochastic Stability Computing the gradients, need feedback information; unfortunately the feedback is noisy in nature in practical systems! A fundamental open question: Q) Under what conditions on noisy feedback would the algorithms converge? Stochastic stability is pertinent to following issues: 1. number of users and/or queuing length remain finite; 2. algorithms converge in some stochastic sense. Related work on stability: • • • • session-level randomness: [Bonald, Massoulie 01], [Kelly 05], and [Lin, Shroff 05] and more ; rate of convergence around the equilibrium points: [Kelly, Malloo, Tan 98], [Kelly 03], … ; arbitrary delay: [La, Rajan 06]; deterministic feedback estimation error: [Mehyar, Spanos, Low 04] Stochastic Primal-Dual Alg. Our focus is on bringing back packet-level dynamics and understanding convergence of the following stochastic alg. ² SA algorithm for source rate updating: ³ ´ ~ : ^ (~ M x ~s (n + 1) = [~ xs (n) + ²n L x (n); p(n); ¸(n)) ] s ¡1 x ~ s ² SA algorithm for shadow price updating: h ^ ¸(i;j) (n + 1) = ¸(i;j) (n) ¡ ²n (L ¸ (i;j) i1 (~ x(n); p(n); ¸(n))) : ² The persistence probability updating rule remains the same. 0 Structure of Stochastic Gradients ^ (¢; ¢; ¢): Observe that Stochastic gradient L x ~ s ^ (~ L x(n); p(n); ¸(n)) = Lx~ (~ x(n); p(n); ¸(n)) + ®s (n) + ³s (n); x ~ s s where ®s (n) is the biased random error in Lx~ (~ x(n); p(n); ¸(n)), given by s h i ¡ L (~ ^ (~ ®s (n) , En L x (n); p(n); ¸(n)) x(n); p(n); ¸(n)) ; x ~ x ~ s and ³s (n) is a martingale di®erence noise s h ¡E L ^ (~ ^ ³s (n) , L x (n); p(n); ¸(n)) x ~ n x ~ s s i (~ x(n); p(n); ¸(n)) : Structure of Stochastic Gradients (Cont’d) ^ Stochastic gradient L ¸ (i;j) ^ L ¸ (i;j) (¢; ¢; ¢): Observe that (~ x(n); p(n); ¸(n)) = L¸ (i;j) (~ x(n); p(n); ¸(n))) + ¯(i;j) (n) + »(i;j) (n); where ¯(i;j) (n) is the biased random error of L¸ (~ x(n); p(n); ¸(n)), given by (i;j) h i ¡L ^ ¯(i;j) (n) , En L (~ x (n); p(n); ¸(n)) (~ x(n); p(n); ¸(n)) ; ¸ ¸ (i;j) (i;j) and »(i;j) (n) is a martingale di®erence noise: ^ »(i;j) (n) , L ¸ (i;j) h ^ (~ x(n); p(n); ¸(n)) ¡ En L ¸ (i;j) i (~ x(n); p(n); ¸(n)) : Technical Assumptions A1. We assume that the estimators of the gradients are based on the measurements in each iteration only. A2. Condition on the step size: P P ! ! 1 ²n > 0, ²n 0, ²n and ²2 < 1 . n n n Condition on the biased error: P A3. P ²n j®s (n)j < 1; 8 s and ²n j¯(i;j) (n)j < 1; 8 (i; j). n n A4. Condition on the martingale di®erence noise: sup En [³s (n)2 ] < 1; 8 s, and sup En [»(i;j) (n)2 ] < 1; 8 (i; j). n n Main Result on Stochastic Stability Theorem: A1 ¡ A4, f(x(n); ¸(n); p(n)); n = 1; 2; : : :g, Under Conditions the iterates generated by the stochastic primal-dual algorithm, converge with probability one to the optimal solutions of Problem .¥ 1. Good news: The stochastic algorithm converges to the desired points under conditions A1 { A4. 2. Caution: ² When ² or the biased terms do not go to zero, we cannot hope to get n convergence w.p.1. ² Even the expectation of the limiting distribution would not be the equilibrium point if biased terms do not go to zero. ² Nevertheless, we expect that the iterates would converge weakly to some random variable \close" to the equilibrium point. An Example on Exponential Marking ² Assume exponential marking is used to feedback price information, where ³P ´ ¸ the overall non-marking probability q = exp P (i;j) 2L (i;j) (s) s2S (i;j) xs ² To estimate the overall price, source s sends N packets during round n n and counts the non-marked packets, say K packets. Then the estimation is log(^ q ), where q^ = K=Nn . ² By de¯nition, ® (n) = exp(~ ¡ log(q)) x (n)) (E [log(^ q )] s s n ² From A2, to ensure the convergence, it su±ces to have that X ² pn < 1 Nn n e.g., when ²n = 1=n, Nn » O(log4 (n)) is su±cient. Proof for the Main Result Sketch of the proof: 1. First, using the stochastic Lyapunov Stability Theorem, we establish the recurrence of any arbitrary small neighborhood of the optimal points. That is, the iterates generated by the stochastic primal-dual algorithm return to a given neighborhood of the optimal points in¯nitely often. 2. Then, we show, via \local analysis," that the recurrent iterates eventually reside in an arbitrary small neighborhood of the optimal points. ² Let (~ x¤ ; p¤ ; ¸¤ ) be a saddle point. ² De¯ne the Lyapunov function V (¢) as follows: V (~ x; ¸) , jjx ~¡x ~¤ jj2 + min jj¸ ¡ ¸¤ jj2 ; ¸¤ 2© ² Note that V (¢; ¢) is well-de¯ned since © is compact. ² It can be shown that V (¢; ¢) is continous. Neighbourhood of Saddle Points ² De¯ne for a given ¹ > 0, a neighborhood set A¹ , f(~ x; ¸) : V (~ x; ¸) · ¹g: ² We note that in general there can be multiple optimal shadow prices in S ff ©. So essentially, A¹ = x ~; ¸g : jjx ~¡x ~¤ jj2 + jj¸ ¡ ¸¤ jj2 · ¹g. ² A is compact. ¹ ¸¤ 2© Step I – Lyapunov Stability jjObserve ¡x x ~ (n + 1)that: ~¤ jj2 · jjx ~(n) ¡ x ~¤ jj2 +2²n (~ x(n) ¡ x ~ ¤ )T Lx~ (~ x(n); p(n); ¸(n)) +2² (~ x(n) ¡ x ~ ¤ )T (®(n) + ³(n)) n +²2 jjLx~ (~ x(n); p(n); ¸(n)) + ®(n) + ³(n)jj2 : n En (jjx ~(n + 1) ¡ x ~¤ jj2 ) · jjx ~(n) ¡ x ~¤ jj2 + 2²n (~ x(n) ¡ x ~¤ )T Lx~ (~ x(n); p(n); ¸(n)) +O(²n jj®(n)jj) + O(²2 ): n Similarly, we have that En (jj¸(n + 1) ¡ ¸¤ jj2 ) · jj¸(n) ¡ ¸¤ jj2 ¡ 2²n (¸(n) ¡ ¸¤ )T L¸ (~ x(n); p(n); ¸(n)) +O(²n jj¯(n)jj) + O(²2 ): n (Cont'd) By the de¯nition of V (¢; ¢), we have that £jj ¤ ¤ jj2 ¤ jj2 · ¡ jj ¡ En [V (~ x(n + 1); ¸)(n + 1)] En x ~(n + 1) x ~ + ¸(n + 1) ¸ ¢ ¤ jj2 ¤ jj2 · ¡jjx ¡ jj ¡ ~(n) x ~ + ¸(n) ¸ + 2² G(~ x(n); ¸ +O (²n (jj®(n)jj + jj¯(n)jj) + O(²2 ): n n Note that the above inequality is true for all ¸¤ 2 ©, and therefore, En [V (~ x(n + 1); ¸(n + 1))] · V (~ x(n); ¸(n)) + 2²n G(~ x(n); ¸(n)) +O (²n (jj®(n)jj + jj¯(n)jj)) + O(²2 ); n with the understanding that G(~ x(n); ¸(n)) = (~ x(n) ¡ x ~¤ )T Lx~ (~ x(n); p(n); ¸(n)) ¡(¸(n) ¡ ¸¤ )T L (~ x(n); p(n); ¸(n)); ¸ min where ¸¤ min = arg min ¸2© jj¸(n) ¡ ¸jj2 . A Supermartingale Lemma Let fXn g be an Rr -valued stochastic process, and V (¢) be a real-valued and non-negative functionPon Rr . Suppose that fYn g is a sequence of random jY j < 1 with probability one. Let fF g be a variables satisfying that n n n ¡ f · g sequence of ¾ algebras, generated by Xi ; Yi ; i n . Suppose that there exists a compact set A ½ Rr such that for all n En [V (Xn+1 )] ¡ V (Xn ) · ¡²n ¾ + Yn ; for Xn 2 = A; where ²n satis¯es A1 and ¾ is a positive constant. Then the set A is recurrent for fXn g, i.e., Xn 2 A for in¯nitely many n with probability one. Recurrence of A¹ It su±ces to show that G(~ x(n); ¸(n)) < 0 when (~ x(n); ¸(n)) 2 Ac . ¹ We upper-bound G(¢; ¢) using the following facts. ² Since the Lagrangian function is concave in x ~, it follows that L(~ x(n); p(n); ¸(n)) ¡ L(~ x¤ ; p(n); ¸(n)) ¸ (~ x(n) ¡ x ~¤ )0 Lx~ (~ x(n); p(n); ¸(n)): ² Since L(~ x; p; ¸) is linear in ¸, we have that L(~ x(n); p(n); ¸¤ ) ¡ L(~ x(n); p(n); ¸(n)) = (¸¤ ¡ ¸(n))0 L¸ (~ x(n); p(n); ¸(n)): (Cont’d) Therefore, G(~ x(n); ¸(n)) · L(~ x(n); p(n); ¸¤ ) ¡ L(~ x¤ ; p(n); ¸(n)) = L(~ x(n); p(n); ¸¤ ) ¡ L(~ x¤ ; p¤ ; ¸¤ ) +L(~ x¤ ; p¤ ; ¸¤ ) ¡ L(~ x¤ ; p¤ ; ¸(n)) +L(~ x¤ ; p¤ ; ¸(n)) ¡ L(~ x¤ ; p(n); ¸(n)): Observe that ² The ¯rst two terms are negative due to the saddle point property of (~ x¤ ; p¤ ; ¸¤ ), i.e., L(~ x(n); p(n); ¸¤ ) · L(~ x¤ ; p¤ ; ¸¤ ) · L(~ x¤ ; p¤ ; ¸(n)). ² The last term is negative because when (~ x¤ ; ¸(n)) is ¯xed, p(n) is the unique maximum for L(¢; ¢; ¢). Conclusion: There exists ±¹ > 0 such that G(fx ~(n); ¸(n)g) < ¡±¹ when (~ x(n); ¸(n)) 2 Ac . Therefore, A¹ is recurrent a.s. ¹ Step II – Local Analysis ² We next show that f(~ x(n); ¸(n); n = 1; 2; : : :g leaves A3¹ only ¯nitely often with probability one. ² Let fn ; k = 1; 2 : : :g denote the recurrent times such that (~ 2 x (n ); ¸(n )) k k k A¹ . ² Thus, it su±ces to show that there exists n 2 fn ; k = 1; 2 : : :g, such k0 k ¸ f that for all n nk , the original iterates (~ x(n); ¸(n)); n = 1; 2; : : :g reside 0 in A3¹ almost surely. One Step Ahead Recall that · ¸ · ¸ · ¸ Lx~ (~ x(n); p(n); ¸(n)) x ~s (n + 1) x ~s (n) = + ²n ¡ s L¸ (~ x(n); p(n); ¸(n)) ¸(i;j) (n + 1) ¸(i;j) (n) (i;j) · ¸ · ¸ x ~ Z s ®s (n) + ³s (n) n +²n + ²n : ¸ ¯(i;j) (n) + »(i;j) (n) Z (i;j) n Basic idea: Show that these three terms diminishes asymptotically! Rate of Convergence ² The rate of convergence is concerned with the asymptotic behavior of normalized errors about the optimal points. ² As is standard, assume that the iterates generated by the stochastic primaldual algorithm have entered in a small neighborhood of an optimal solution (~ x¤ ; ¸¤ ). p p ² De¯ne U (n) , (~ ¡ ¡ ¤ ¤ x(n) x ~ )= ²n and U¸ (n) , (¸(n) ¸ )= ²n . x ~ ² Construct U n (t) to be the piecewise constant interpolation of U (n) = fU (n); U (n)g, i.e., U n (t) = U , for t 2 [t ¡ t ;t ¡ t ), where x ~ P ¸ n+i n+i n n+i+1 n ¡1 n tn , ² . i=0 n Un+2 Un (t) U n Un+1 ²n ²n+1 ²n+2 t Assumptions A5. Let µ(n) , (~ x(n); ¸(n)) and Án , (³(n); »(n)). Suppose for any given small ½ > 0, there exists a positive de¯nite symmetric matrix § = ¾¾ 0 such that En [Án ÁT ¡ §]I fjµ(n) ¡ µ¤ j · ½g ! 0 n as n ! 1. De¯ne A, · (~ x¤ ; p¤ ; ¸¤ ) Lx~x~ L¸~x ¡L (~ ¤ ; p¤ ; ¸¤ ) x ¸~ x (~ x¤ ; p¤ ; ¸¤ ) 0 ¸ : (1) A6. Let ²n = 1=n; and assume A + I=2 is a Hurwitz matrix. Note that it can be shown that the real parts of the eigenvalues of A are all non-positive [Bertsekas, 99]. Main Result on Rate of Convergence a) Under Conditions A1 and A3 ¡ A6, U n (¢) converges weakly to the solution (denoted as U ) to the Skorohod problem µ ¶ µ ¶µ ¶ µ ¶ I dUx~ Ux~ dzx~ = A+ dt + ¾dw(t) + ; dU¸ U dz 2 ¸ ¸ b) If (~ x¤ ; ¸¤ ) is an interior point in the constraint set, then the limiting process U is a stationary Gaussian di®usion process, and U (n) converges in distribution to a normally distributed random variable with mean zero and covariance §. c) If (~ x¤ ; ¸¤ ) is on the boundary of the constraint set, then the limiting process U is a stationary re°ected linear di®usion process. Remarks and Engineering Insights 1) In general, the limit process is a stationary re°ected linear di®usion process, not necessarily the standard Gaussian di®usion process. 2) The limit process would be Gaussian if there is no re°ection term. For instance, when all the link constraints in Problem P are active at the optimal point. ¡ 3) The ¢ rate of convergence depends heavily on the smallest eigenvalue of A + I . The more negative the smallest eigenvalue is, the faster the rate of 2 convergence would be. 4) Intuitively speaking, the re°ection terms would help increase the speed of convergence, which unfortunately cannot be characterized exactly. 5) The covariance matrix of the limit process gives a measure of the spread at the equilibrium point, and is typically \smaller" than the unconstrained case. Conclusions 1) We have studied joint °ow control and MAC design in multi-hop wireless networks with random access; and formulate rate control as a network utility maximization problem. 2) We use the Lagrangian dual decomposition method to devise a distributed primal-dual algorithm that provides an exact solution. 3) We have shown that the proposed primal-dual algorithm converges almost surely to the optimal solutions provided that the estimator is asymptotically unbiased. 4) Our ¯ndings on the rate of convergence reveal that in general the limit process of the interpolated process, corresponding to the normalized iterate sequence generated from the primal-dual algorithm, is a re°ected linear di®usion process, not necessarily the Gaussian di®usion process.