Dynamic Optimization and Learning for Renewal Systems -With applications to Wireless Networks and Peer-to-Peer Networks Task 3 Task 2 Task 1 T/R T/R T/R t T[0] T[1] T[2] Network Coordinator T/R Michael J. Neely, University of Southern California T/R Outline: • Optimization of Renewal Systems • Application 1: Task Processing in Wireless Networks Quality-of-Information (ARL CTA project) Task “deluge” problem • Application 2: Peer-to-Peer Networks Social networks (ARL CTA project) Internet and wireless References: General Theory and Application 1: • • M. J. Neely, Stochastic Network Optimization with Application to Communication and Queueing Systems, Morgan & Claypool, 2010. M. J. Neely, “Dynamic Optimization and Learning for Renewal Systems,” Proc. Asilomar Conf. on Signals, Systems, and Computers, Nov. 2010. Application 2 (Peer-to-Peer): • M. J. Neely and L. Golubchik, “Utility Optimization for Dynamic Peer-to-Peer Networks with Tit-for-Tat Constraints,” Proc. IEEE INFOCOM, 2011. These works are available on: http://www-bcf.usc.edu/~mjneely/ A General Renewal System y[0] y[2] y[1] t T[0] T[1] T[2] •Renewal Frames r in {0, 1, 2, …}. •π[r] = Policy chosen on frame r. •P = Abstract policy space (π[r] in P for all r). •Policy π[r] affects frame size and penalty vector on frame r. π[r] •y[r] = [y0(π[r]), y1(π[r]), …, yL(π[r])] •T[r] = T(π[r]) = Frame Duration A General Renewal System y[0] y[2] y[1] t T[0] T[1] T[2] •Renewal Frames r in {0, 1, 2, …}. •π[r] = Policy chosen on frame r. •P = Abstract policy space (π[r] in P for all r). •Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]): π[r] •y[r] = [y0(π[r]), y1(π[r]), …, yL(π[r])] •T[r] = T(π[r]) = Frame Duration A General Renewal System y[0] y[2] y[1] t T[0] T[1] T[2] •Renewal Frames r in {0, 1, 2, …}. •π[r] = Policy chosen on frame r. •P = Abstract policy space (π[r] in P for all r). •Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]): π[r] •y[r] = [1.2, 1.8, …, 0.4] •T[r] = 8.1 = Frame Duration A General Renewal System y[0] y[2] y[1] t T[0] T[1] T[2] •Renewal Frames r in {0, 1, 2, …}. •π[r] = Policy chosen on frame r. •P = Abstract policy space (π[r] in P for all r). •Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]): π[r] •y[r] = [0.0, 3.8, …, -2.0] •T[r] = 12.3 = Frame Duration A General Renewal System y[0] y[2] y[1] t T[0] T[1] T[2] •Renewal Frames r in {0, 1, 2, …}. •π[r] = Policy chosen on frame r. •P = Abstract policy space (π[r] in P for all r). •Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]): π[r] •y[r] = [1.7, 2.2, …, 0.9] •T[r] = 5.6 = Frame Duration Example 1: Opportunistic Scheduling S[r] = (S1[r], S2[r], S3[r]) •All Frames = 1 Slot •S[r] = (S1[r], S2[r], S3[r]) = Channel States for Slot r •Policy π[r]: On frame r: First observe S[r], then choose a channel to serve (i.,e, {1, 2, 3}). •Example Objectives: thruput, energy, fairness, etc. Example 2: Convex Programs (Deterministic Problems) Minimize: f(x1, x2, …, xN) Subject to: gk(x1, x2, …, xN) ≤ 0 for all k in {1,…, K} (x1, x2, …, xN) in A Example 2: Convex Programs (Deterministic Problems) Minimize: f(x1, x2, …, xN) Subject to: gk(x1, x2, …, xN) ≤ 0 for all k in {1,…, K} (x1, x2, …, xN) in A Equivalent to: Minimize: f(x1[r], x2[r], …, xN[r]) Subject to: gk(x1[r], x2[r], …, xN[r]) ≤ 0 for all k in {1,…, K} (x1[r], x2[r], …, xN[r]) in A for all frames r •All Frames = 1 Slot. •Policy π[r] = (x1[r], x2[r], …, xN[r]) in A. •Time average: f(x[r]) = R-1 limR∞ (1/R)∑r=0 f(x[r]) Example 2: Convex Programs (Deterministic Problems) Minimize: f(x1, x2, …, xN) Subject to: gk(x1, x2, …, xN) ≤ 0 for all k in {1,…, K} (x1, x2, …, xN) in A Equivalent to: Minimize: f(x1[r], x2[r], …, xN[r]) Subject to: gk(x1[r], x2[r], …, xN[r]) ≤ 0 for all k in {1,…, K} (x1[r], x2[r], …, xN[r]) in A for all frames r Jensen’s Inequality: The time average of the dynamic solution (x1[r], x2[r], …, xN[r]) solves the original convex program! Example 3: Markov Decision Problems 2 4 1 3 •M(t) = Recurrent Markov Chain (continuous or discrete) •Renewals are defined as recurrences to state 1. •T[r] = random inter-renewal frame size (frame r). •y[r] = penalties incurred over frame r. •π[r] = policy that affects transition probs over frame r. •Objective: Minimize time average of one penalty subj. to time average constraints on others. Example 4: Task Processing over Networks T/R T/R Task 3 Task 2 Task 1 Network Coordinator T/R T/R T/R T/R •Infinite Sequence of Tasks. •E.g.: Query sensors and/or perform computations. •Renewal Frame r = Processing Time for Frame r. •Policy Types: •Low Level: {Specify Transmission Decisions over Net} •High Level: {Backpressure1, Backpressure2, Shortest Path} •Example Objective: Maximize quality of information per unit time subject to per-node power constraints. R→∞ J. R Neely Michael r= 1 Quick Review of Renewal-Reward Theory (Pop Quiz Next Slide!) R r = 1 y0 [r ] Time Avg. = for limy0[r]: R Define the frame-average R→∞ T[r ] r = 1 R 1 1 R y0= lim y0r[r= ]1 y0[r ] R R r= 1 R =R→∞lim R→∞ 1 r = 1 T[r ] R y0 = T The time-average for y0[r] is then: R TimeRAvg. y0[r=] lim R→∞ r= 1 R r = 1 T[r ] = r = 11 y0 [rR] lim RR r]= 1 0 R→∞ T[r r= 1 R 1 R→∞ R 1 = 1] R= 1 y0r [r r lim R1 R = R→∞ r = 1 T[r ] R = lim y [r ] y0 = T T[ry] 0 T *If i.i.d. over frames, by LLN this is the same as E{y0}/E{T}. Michael J. Neely Pop Quiz: (10 points) •Let y0[r] = Energy Expended on frame r. •Time avg. power = (Total Energy Use)/(Total Time) •Suppose (for simplicity) behavior is i.i.d. over frames. y0[r ] E To minimize time average power, which one should T[r ] we minimize? (a) E y0[r ] T[r ] (b) E { y0[r ]} E { T[r ]} Michael J. Neely Pop Quiz: (10 points) •Let y0[r] = Energy Expended on frame r. •Time avg. power = (Total Energy Use)/(Total Time) •Suppose (for simplicity) behavior is i.i.d. over frames. y0[r ] E To minimize time average power, which one should T[r ] we minimize? (a) E y0[r ] T[r ] (b) E { y0[r ]} E { T[r ]} Two General Problem Types: Renewal Slides 1) Minimize time average subject to time average constraints: Minimize: y0 / T Michael J. Neely Subject to: (1) yl / T ≤ cl ∀l ∈ { 1, . . . , L } (2) π[r ] ∈ P ∀r ∈ { 0, 1, 2, . . .} 2) Maximize concave function φ(x1, …, xL) of time average: Maximize: φ(y1/ T, y2/ T, . . . , yL / T) Subject to: (1) yl / T ≤ cl ∀l ∈ { 1, . . . , L } (2) π[r ] ∈ P ∀r ∈ { 0, 1, 2, . . .} Solving the Problem (Type 1): Minimize: y0 / T Subject to: (1) yl / T ≤ cl ∀l ∈ { 1, . . . , L } (2) π[r ] ∈ P ∀r ∈ { 0, 1, 2, . . .} Renewal Slides Define a “Virtual Queue” for each inequality constraint: Michael J. Neely yl[r] Zl[r] clT[r] Zl[r+1] = max[Zl[r] – clT[r] + yl[r], 0] Queue Stable iff yl ≤ cl T iff yl / T ≤ cl Lyapunov Function and “Drift-Plus-Penalty Ratio”: Minimize: y0 / T Subject to: (1) yl / T ≤ cl ∀l ∈ { 1, . . . , L} (2) π[r ] ∈ P ∀r ∈ { 0, 1, 2, . . .} Z1(t) Z2(t) •Scalar measure of queue sizes: L[r] = Z1[r]2 + Z2[r]2 + … + ZL[r]2 Δ(Z[r]) = E{L[r+1] – L[r] | Z[r]} = “Frame-Based Lyap. Drift” •Algorithm Technique: Every frame r, observe Z1[r], …, ZL[r]. Then choose a policy π[r] in P to minimize: “Drift-Plus-Penalty Ratio” = Δ(Z[r]) + VE{y0[r]|Z[r]} E{T|Z[r]} Renewal Slides The Algorithm Becomes: Michael J. Neely Δ(Z[r]) + VE{y0[r]|Z[r]} Minimize: y0 / T Subject to: (1) yl / T ≤ cl ∀l ∈ { 1, . . . , L} (2) π[r ] ∈ P ∀r ∈ { 0, 1, 2, . . .} E{T|Z[r]} •Observe Z[r] = (Z1[r], …, ZL[r]). Choose π[r] in P to solve: E V ŷ0(π[r ]) + L l = 1 Z l [r ]ŷl (π[r ])|Z [r ] E T̂(π[r ])|Z [r ] •Then update virtual queues: Zl[r+1] = max[Zl[r] – clT[r] + yl[r], 0] Renewal Slides Δ(Z[r]) + VE{y [r]|Z[r]} Renewal Slides 0 DPP Ratio: E{T|Z[r]} Michael J. Neely Michael J. Neely Theorem: Assume the constraints are feasible. Then under this algorithm, we achieve: (a) lim sup 1 R→∞ R lim sup R→∞ (b) 1 R 1 R R 1 r = 1 yl [r ] R 1R R T[r ] Rr = 1 r =l 1 y [r ] ≤ cl ∀l ∈ { 1, . . . , L } (w.p.1) ≤ c l R 1 R T[r ] 1 R R r =r 1= 1 E { y0 [r ]} R 1R E { T[r ]} Rr = 1 r = 1 0 E { y [r ]} R r = 1 E { T[r ]} ∀l ∈ { 1, . . . , L } (w.p.1) ≤ r ati oopt + O(1/ V ) ≤ r ati oopt + O(1/ V ) For all frames r in {1, 2, 3, …} Application 1 – Task Processing: T/R T/R Task 3 Task 2 Setup Task 1 Network Coordinator Transmit T/R T/R T/R Idle I[r] Frame r •Every Task reveals random task parameters η[r]: η[r] = [(qual1[r], T1[r]), (qual2[r], T2[r]), …, (qual5[r], T5[r])] •Choose π[r] = [which node to transmit, how much idle] in {1,2,3,4,5} X [0, Imax] •Transmissions incur power •We use a quality distribution that tends to be better for higher-numbered nodes. •Maximize quality/time subject to pav≤ 0.25 for all nodes. π∈ P val(θ) E { a(π, η) − θb(π, η)} = πinf Minimizing the Drift-Plus-Penalty Ratio: ∈P E { a(π, η)} •Minimizing a pure expectation, rather than a ratio, E { a(π, η)} E { b(π, η)} Neuro-DP). is typically easier (see Bertsekas, Tsitsiklis E { b(π, η)} •Define: E { a(π, η)} θ =∗ inf E { a(π, η)} P E { b(π, η)} θ =π∈inf π∈ P E { b(π, η)} ∗ •“Bisection Lemma”: ∗ infinf E {Ea(π, η) − θ { a(π, η) − θ∗b(π, b(π, η)} η)} == 00 π∈ Pπ∈ P ∗θ∗ infinfE E { a(π, η) − θb(π, η)} < 0 if θ > { a(π, η) − θb(π, η)} < 0 if θ > θ π∈ P π∈ P ∗ ∗ inf E { a(π, η) − θb(π, η)} > 0 if θ < θ inf E { a(π, η) − θb(π, η)} > 0 if θ < θ π∈ P π∈ P Learning via Sampling fromJ.theNeely past: Michael •Suppose randomness characterized by: {η1, η2, ..., ηW} (past random samples) Renewal Slides •Want to compute (over unknown random distribution of η): Michael J. Neely val(θ) = inf E { a(π, η) − θb(π, η)} π∈ P •Approximate this via W samples from the past: 1 ˜ val(θ) = W ∗ W E { a(π, η)} [a(π,η)} ηw ) − θb(π, ηw )] Eπinf { b(π, ∈P w= 1 θ = inf E { a(π, η)} Simulation: Quality of Information / Unit Time Alternative Alg. With Time Averaging Drift-Plus-Penalty Ratio Alg. With Bisection Sample Size W y2/ T = 0.249547 ≤ 0.25 y3/ T = / T = 0.250032 ≤ 0.25 y5/ T = Concluding Sims (values fory4W=10): 1 0.182335 ≤≤ 0.25 yy2/1/TT == 0.249547 0.25 y3/ T = 0.250018 ≤ 0.25 0.249547 ≤≤ 0.25 = 0.852950 yy4/2/TT == 0.250032 0.25 y5q.o.i / T =/ T0.250046 ≤ 0.25, I dle y3 / T y4 / T y5 / T = 0.250018 ≤ 0.25 = 0.250032 ≤ 0.25 q.o.i / T = 0.852950 , I dle = 1.421260 = 0.250046 ≤ 0.25 q.o.i / T = 0.852950 , I dle = 1.421260 T/R T/R Task 3 Task 2 Setup Task 1 Network Coordinator Transmit Frame r T/R Idle I[r] T/R T/R “Application 2” – Peer-to-Peer Wireless Networking: 1 2 • • • • • 3 Network Cloud 4 5 N nodes. Each node n has download social group Gn. Gn is a subset of {1, …, N}. Each file f is in some subset of nodes Nf. Each node n can request download of a file f from any node in Gn Nf • Transmission rates (µab(t)) between nodes are chosen in some (possibly time-varying) set G(t) “Internet Cloud” Example 1: Uplink capacity C1uplink 1 2 3 Network Cloud 4 5 • G(t) = Constant (no variation). • ∑bµnb(t) ≤ Cnuplink for all nodes n. This example assumes uplink capacity is the bottleneck. “Internet Cloud” Example 2: 1 2 3 Network Cloud 4 5 • G(t) specifies a single supportable (µab(t)). No “transmission rate decisions.” The allowable rates (µab(t)) are given to the peer-to-peer system from some underlying transport and routing protocol. “Wireless Basestation” Example 3: = base station = wireless device • Wireless device-to-device transmission increases capacity. • (µab(t)) chosen in G(t). • Transmissions coordinated by base station. “Commodities” for Request Allocation • Multiple file downloads can be active. • Each file corresponds to a subset of nodes. • Queueing files according to subsets would result in O(2N) queues. (complexity explosion!). Instead of that, without loss of optimality, we use the following alternative commodity structure… “Commodities” for Request Allocation j (An(t), Nn(t)) n k Gn Nn(t) m • Use subset info to determine the decision set. “Commodities” for Request Allocation j (An(t), Nn(t)) n k Gn Nn(t) m • Use subset info to determine the decision set. • Choose which node will help download. “Commodities” for Request Allocation j (An(t), Nn(t)) n k m Qmn(t) • Use subset info to determine the decision set. • Choose which node will help download. • That node queues the request: Qmn(t+1) = max[Qmn(t) + Rmn(t) - µmn(t), 0] • Subset info can now be thrown away. Stochastic Network Optimization Problem: Maximize: ∑n gn(∑a ran) Subject to: (1) Qmn < infinity (Queue Stability Constraint) (2) α ∑a ran ≤ β + ∑b rnb for all n (Tit-for-Tat Constraint) Stochastic Network Optimization Problem: Maximize: concave utility function ∑n gn(∑a ran) Subject to: (1) Qmn < infinity (Queue Stability Constraint) (2) α ∑a ran ≤ β + ∑b rnb for all n (Tit-for-Tat Constraint) Stochastic Network Optimization Problem: Maximize: concave utility function ∑n gn(∑a ran) time average request rate Subject to: (1) Qmn < infinity (Queue Stability Constraint) (2) α ∑a ran ≤ β + ∑b rnb for all n (Tit-for-Tat Constraint) Stochastic Network Optimization Problem: Maximize: concave utility function ∑n gn(∑a ran) time average request rate Subject to: (1) Qmn < infinity (Queue Stability Constraint) (2) α ∑a ran ≤ β + ∑b rnb for all n (Tit-for-Tat Constraint) α x Download rate Stochastic Network Optimization Problem: Maximize: concave utility function ∑n gn(∑a ran) time average request rate Subject to: (1) Qmn < infinity (Queue Stability Constraint) (2) α ∑a ran ≤ β + ∑b rnb for all n (Tit-for-Tat Constraint) α x Download rate β + Upload rate Solution Technique for Infocom paper • Use “Drift-Plus-Penalty” framework in a new “Universal Scheduling” scenario. • We make no statistical assumptions on the stochastic processes [S(t); (An(t), Nn(t))]. Resulting Algorithm: • (Auxiliary Variables) For each n, choose an aux. variable γn(t) in interval [0, Amax] to maximize: Vgn(γn(t)) – Hn(t)gn(t) • (Request Allocation) For each n, observe the following value for all m in {Gn Nn(t)}: -Qmn(t) + Hn(t) + (Fm(t) – αFn(t)) Give An(t) to queue m with largest non-neg value, Drop An(t) if all above values are negative. • (Scheduling) Choose (µab(t)) in G(t) to maximize: ∑nb µnb(t)Qnb(t) How the Incentives Work for node n: Node n can only request downloads from others if it finds a node m with a non-negative value of: -Qmn(t) + Hn(t) + (Fm(t) – αFn(t)) Fn(t) = “Node n Reputation” (Good reputation = Low value) α x Receive Help(t) Fn(t) β + Help Others(t) How the Incentives Work for node n: Node n can only request downloads from others if it finds a node m with a non-negative value of: -Qmn(t) + Hn(t) + (Fm(t) – αFn(t)) Bounded Compare Reputations! Fn(t) = “Node n Reputation” (Good reputation = Low value) α x Receive Help(t) Fn(t) β + Help Others(t) Concluding Theorem: For any arbitrary [S(t); (An(t), Nn(t))] sample path, we guarantee: a) Qmn(t) ≤ Qmax = O(V) for all t, all (m,n). b) All Tit-for-Tat constraints are satisfied. c) For any T>0: liminfK∞ [Achieved Utility(KT)] ≥ K liminfK∞ (1/K)∑i=1 [“T-Slot-Lookahead-Utility[i]”]- BT/V Frame 1 0 Frame 2 T Frame 3 2T 3T Conclusions for Peer-to-Peer Problem: • Framework for posing peer-to-peer networking as stochastic network optimization problems. • Can compute optimal solution in polynomial time. Conclusions overall: • Renewal Optimization Framework can be viewed as “Generalized Linear Programming” • Variable Length Scheduling Modes • Many applications (task processing, peer-to-peer networks, Markov decision problems, linear programs, convex programs, stock market, smart grid, energy harvesting, and many more) Solving the Problem (Type 2): Maximize: φ(y1/ T, y2/ T, . . . , yL / T) Subject to: (1) yl / T ≤ cl ∀l ∈ { 1, . . . , L } (2) π[r ] ∈ P ∀r ∈ { 0, 1, 2, . . .} We reduce it to a problem with the structure of Type 1 via: •Auxiliary Variables γ[r] = (γ1[r], …, γL[r]). •The following variation on Jensen’s Inequality: For any concave function φ(x1, .., xL) and any (arbitrarily correlated) vector of random variables (x1, x2, …, xL, T), where T>0, we have: E{Tφ(X1, …, XL)} E{T} ≤ φ( E{T(X1, …, XL)} E{T} ) Michael Neely The Algorithm (type 2)Michael Becomes: J. J. Neely •On frame r, observe Z[r] = (Z1[r], …, ZL[r]). •(Auxiliary Variables) Choose γ1[r], …, γL[r] to max the below deterministic problem: L L [r ]γ [r ] Maximize: V φ(γ [r ], . . . , γ [r ]) − G 1 L Maximize: V φ(γ1[r ], . . . , γM [r ]) − l = 1 l =l1 Gl l[r ]γl [r ] Subject to: ∀l ∈ . . ,.L. .} , L } Subject to: γm γmi ni n≤ ≤γl γ[rl ][r≤] ≤γmγaxm ax ∀l {∈1, {. 1, •(Policy Selection) Choose π[r] in P to minimize: L L E Z [r ]ŷ (π[r ]) − l l =L 1 l l = 1LGl [r ]ŷl (π[r ])|Z [r ] E l = 1 Z l [r ]ŷl (π[r ]) − l = 1 Gl [r ]ŷl (π[r ])|Z [r ] E T̂(π[r ])|Z [r ] E T̂(π[r ])|Z [r ] •Then update virtual queues: Zl[r+1] = max[Zl[r] – clT[r] + yl[r], 0], Gl[r+1] = max[Gl[r] + γl[r]T[r] - yl[r], 0]