A Stochastic Primal-Dual Algorithm for Joint Flow Control and MAC

advertisement
A Stochastic Primal-Dual Algorithm for
Joint Flow Control and MAC
Design in Multi-hop Wireless Networks
Junshan Zhang and Dong Zheng
Department of Electrical Engineering
Arizona State University
March 22, 2006
Multi-hop Wireless Networks
Bursty
traffic
Channel
fading
Rando
m
access
Stochastic attributes of wireless networks:
•
•
•
•
•
time-varying channel conditions;
random access, co-channel interference;
burstiness of traffic flows;
Mobility, dynamic network topology;
…
Objective and Approach
•Goal: QoS provisioning through joint flow control and
MAC design in multi-hop random access networks.
•Approach: (stochastic) network utility maximization
• Treat rate control as a utility maximization problem
• Different layers function “cooperatively” to achieve the
optimum point (equilibrium point).
Basic Setting
² Consider a wireless network modelled as a directed graph G = (N; E);
² Let U (x ) denote the utility function of °ow s, where x is the °ow rate;
s s
s
² Consider persistence scheme with xmission prob. fp
8(i; j) 2 E g.
;
(i;j)
² Let N I (i) denote the set of nodes whose transmissions interfere with node
to
i's reception.
² Use N (i) to denote the set of the nodes from which node i receives tra±c,
in
Nout (i) to denote the set of nodes to which node i is sending packets and
NI
(i) to denote the set of nodes whose reception is interfered by node
f rom
i's transmission.
Problem Formulation
¥ : max
P
Us (xs )
Q
s2SP
·
subject to
xs c(i;j) (p(i;j)
(1 ¡ Pk )); 8 (i; j)
s2SP
((i;j))
k2N I (j)
to
p(i;j) = Pi ; 8 i
fx g
s
j 2Nout (i)
0 · xs · Ms ; 8 s
0 · Pi · 1; 8 i;
where Ms is the maximum for °ow data rate of s, c(i;j) is the average transmission rate and the utility function Us (¢) takes the following general form [Mo,
Walrand 00]
½
U· (ri ) =
wi log ri ;
wi (1 ¡ ·)¡1 r 1¡· ;
i
if · = 1
otherwise:
Observation: Problem ¥ is non-convex and non-separable.
Related work
1. Internet TCP/AQM: [Kelly, Maulloo, Tan 98], [Low,
Lapsley 99], [Srikant 05] and many many more
(sorry due to limited space );
2. Joint flow control/routing/MAC/PHY design:
[Chiang 04], [Lin, Shroff 05], [Wang, Kar 05], [Chen,
Low, Chiang, Doyle 06], and [Lee, Chiang,
Calderbank 06], … ;
A common assumption:
Most works above assume deterministic
feedback in their distributed
algorithms.
Outline
1. Deterministic primal-dual algorithm;
2. Modelling stochastic perturbation;
3. Convergence of the stochastic primal-dual
algorithm;
4. Rate of convergence.
Deterministic Version: Problem Transformation
Let x
~s = log(xs ) [Lee, Chiang, Calderbank 06], we have
P : max
fx
~ g
s
subject to
P
Us0 (~
xs )
s2S P
log( 2S
exp(~
xs )) ¡ log(p(i;j) )
¡ P s ((i;j))
¡ P ) · 0; 8 (i; j)
log(1
k
I (j)
P k2Nto
8i
p
=
P
;
i
j 2N out (i) (i;j)
¡1
·x
·
8
~ ; s
~s M
s
·
·
8
0 Pi 1; i;
where Us0 (~
xs ) = Us (exp(~
xs )).
Observation: Problem P is convex and separable if · ¸ 1.
Lagrange Dual Approach
The Lagrangian function is
L(~
x; p; ¸) =
8
<X
:
s
Us0 (~
xs )
¡
X
¸(i;j)
0
log @
X
s2S ((i;j))
(i;j)
Then, the Lagrange dual function is
Q(¸) = P
max
p(i;j) =Pi
0·P·1
¡1·x
~
~·M
j 2 Nout (i)
and the dual problem is given by
D : min Q(¸)
¸¸0
19
0
= X
exp(~
xs )A +
¸(i;j) log @p(i;j)
;
(i;j)
L(~
x; p; ¸);
Y
(1 ¡ Pk )A
k2N I (j)
to
1
Strong Duality
Proposition:
a) There is no duality gap, i.e., the minimum value
of the dual problem D is equal to the maximal
value of the primal problem P.
b) Let © be the set of ¸ that minimizes Q(¸). Then © is
non-empty and compact; and 8¸ 2 ©, there exists a unique
vector (~
x¤ ; p¤ ) that maximizes the Lagrangian function L(¢; ¢; ¢).
Distributed Primal-dual Alg.
² The source rates are updated by
2
3~
Ms
6
0
17
6
7
X
6
7
¸
¡
6
7
@
A
(i;j)(n)
_
0
P
x
~s (n+1) = 6x
~s (n) + ²n U (~
xs (n)) exp(~
xs (n))
s
exp(~
xs (n)) 7
6
7
2S ((i;j))
s
2L
4
(i;j)
(s)
|
{z
}5
,L
x
~s
(~
x(n);p(n);¸(n))
:
¡1
² The shadow prices are updated by
2
31
6
0
0
117
6
7
X
X
6
7
¡ ² @log(p
¡ P (n)) ¡ log @
AA7 :
¸(i;j) (n+1) = 6
¸
(n)
(n))
+
log(1
exp(~
x
(n))
6 (i;j)
7
n
k
s
(i;j)
6
7
4
s2S ((i;j))
k 2 N I (j)
|
{z
}5
to
L¸
(~
x(n);p(n);¸(n))
(i;j)
² The persistence probabilities are updated by
p(i;j) (n + 1) = P
k 2N
out
(i)
¸(i;k) (n) +
¸
(n)
P (i;j)
(l;m):m 2 N I
f rom
(i);l2 N
in
(m)(n)
¸(l;m)
:
0
Stochastic Stability
Computing the gradients, need feedback information;
unfortunately the feedback is noisy in nature in practical
systems!
A fundamental open question:
Q) Under what conditions on noisy feedback would the algorithms converge?
Stochastic stability is pertinent to following issues:
1. number of users and/or queuing length remain finite;
2. algorithms converge in some stochastic sense.
Related work on stability:
•
•
•
•
session-level randomness: [Bonald, Massoulie 01], [Kelly
05], and [Lin, Shroff 05] and more ;
rate of convergence around the equilibrium points:
[Kelly, Malloo, Tan 98], [Kelly 03], … ;
arbitrary delay: [La, Rajan 06];
deterministic feedback estimation error: [Mehyar,
Spanos, Low 04]
Stochastic Primal-Dual Alg.
Our focus is on bringing back packet-level dynamics and
understanding
convergence of the following stochastic alg.
²
SA algorithm for source rate updating:
³
´
~ :
^ (~
M
x
~s (n + 1) = [~
xs (n) + ²n L
x
(n);
p(n);
¸(n))
]
s
¡1
x
~
s
² SA algorithm for shadow price updating:
h
^
¸(i;j) (n + 1) = ¸(i;j) (n) ¡ ²n (L
¸
(i;j)
i1
(~
x(n); p(n); ¸(n))) :
² The persistence probability updating rule remains the same.
0
Structure of Stochastic Gradients
^ (¢; ¢; ¢): Observe that
Stochastic gradient L
x
~
s
^ (~
L
x(n); p(n); ¸(n)) = Lx~ (~
x(n); p(n); ¸(n)) + ®s (n) + ³s (n);
x
~
s
s
where ®s (n) is the biased random error in Lx~ (~
x(n); p(n); ¸(n)), given by
s
h
i
¡ L (~
^ (~
®s (n) , En L
x
(n);
p(n);
¸(n))
x(n); p(n); ¸(n)) ;
x
~
x
~
s
and ³s (n) is a martingale di®erence noise
s
h
¡E L
^ (~
^
³s (n) , L
x
(n);
p(n);
¸(n))
x
~
n
x
~
s
s
i
(~
x(n); p(n); ¸(n)) :
Structure of Stochastic Gradients (Cont’d)
^
Stochastic gradient L
¸
(i;j)
^
L
¸
(i;j)
(¢; ¢; ¢): Observe that
(~
x(n); p(n); ¸(n)) = L¸
(i;j)
(~
x(n); p(n); ¸(n))) + ¯(i;j) (n) + »(i;j) (n);
where ¯(i;j) (n) is the biased random error of L¸
(~
x(n); p(n); ¸(n)), given by
(i;j)
h
i
¡L
^
¯(i;j) (n) , En L
(~
x
(n);
p(n);
¸(n))
(~
x(n); p(n); ¸(n)) ;
¸
¸
(i;j)
(i;j)
and »(i;j) (n) is a martingale di®erence noise:
^
»(i;j) (n) , L
¸
(i;j)
h
^
(~
x(n); p(n); ¸(n)) ¡ En L
¸
(i;j)
i
(~
x(n); p(n); ¸(n)) :
Technical Assumptions
A1. We assume that the estimators of the gradients
are based on the measurements in each iteration only.
A2. Condition
on the step size:
P
P
!
!
1
²n > 0, ²n
0,
²n
and
²2 < 1 .
n n
n
Condition on the biased
error:
P A3.
P
²n j®s (n)j < 1; 8 s and
²n j¯(i;j) (n)j < 1; 8 (i; j).
n
n
A4. Condition on the martingale di®erence noise:
sup En [³s (n)2 ] < 1; 8 s, and sup En [»(i;j) (n)2 ] < 1; 8 (i; j).
n
n
Main Result on Stochastic Stability
Theorem:
A1 ¡ A4,
f(x(n); ¸(n); p(n)); n = 1; 2; : : :g,
Under Conditions
the iterates
generated by the stochastic primal-dual algorithm, converge with
probability one to the optimal solutions of Problem
.¥
1. Good news: The stochastic algorithm converges to the desired points
under conditions A1 { A4.
2. Caution:
² When ² or the biased terms do not go to zero, we cannot hope to get
n
convergence w.p.1.
² Even the expectation of the limiting distribution would not be the equilibrium point if biased terms do not go to zero.
² Nevertheless, we expect that the iterates would converge weakly to some
random variable \close" to the equilibrium point.
An Example on Exponential Marking
² Assume exponential marking is used to feedback price information, where
³P
´
¸
the overall non-marking probability q = exp
P (i;j)
2L
(i;j)
(s)
s2S (i;j)
xs
² To estimate the overall price, source s sends N packets during round n
n
and counts the non-marked packets, say K packets. Then the estimation
is log(^
q ), where q^ = K=Nn .
² By de¯nition, ® (n) = exp(~
¡ log(q))
x
(n))
(E
[log(^
q
)]
s
s
n
² From A2, to ensure the convergence, it su±ces to have that
X ²
pn < 1
Nn
n
e.g., when ²n = 1=n, Nn » O(log4 (n)) is su±cient.
Proof
for
the
Main
Result
Sketch of the proof:
1. First, using the stochastic Lyapunov Stability Theorem, we establish the
recurrence of any arbitrary small neighborhood of the optimal points.
That is, the iterates generated by the stochastic primal-dual algorithm
return to a given neighborhood of the optimal points in¯nitely often.
2. Then, we show, via \local analysis," that the recurrent iterates eventually
reside in an arbitrary small neighborhood of the optimal points.
² Let (~
x¤ ; p¤ ; ¸¤ ) be a saddle point.
² De¯ne the Lyapunov function V (¢) as follows:
V (~
x; ¸) , jjx
~¡x
~¤ jj2 + min jj¸ ¡ ¸¤ jj2 ;
¸¤ 2©
² Note that V (¢; ¢) is well-de¯ned since © is compact.
² It can be shown that V (¢; ¢) is continous.
Neighbourhood of Saddle Points
² De¯ne for a given ¹ > 0, a neighborhood set
A¹ , f(~
x; ¸) : V (~
x; ¸) · ¹g:
² We note that in general there can be multiple optimal shadow prices in
S ff
©. So essentially, A¹ =
x
~; ¸g : jjx
~¡x
~¤ jj2 + jj¸ ¡ ¸¤ jj2 · ¹g.
² A is compact.
¹
¸¤ 2©
Step I – Lyapunov Stability
jjObserve
¡x
x
~ (n + 1)that:
~¤ jj2
·
jjx
~(n) ¡ x
~¤ jj2
+2²n (~
x(n) ¡ x
~ ¤ )T Lx~ (~
x(n); p(n); ¸(n))
+2² (~
x(n) ¡ x
~ ¤ )T (®(n) + ³(n))
n
+²2 jjLx~ (~
x(n); p(n); ¸(n)) + ®(n) + ³(n)jj2 :
n
En (jjx
~(n + 1) ¡ x
~¤ jj2 ) · jjx
~(n) ¡ x
~¤ jj2 + 2²n (~
x(n) ¡ x
~¤ )T Lx~ (~
x(n); p(n); ¸(n))
+O(²n jj®(n)jj) + O(²2 ):
n
Similarly, we have that
En (jj¸(n + 1) ¡ ¸¤ jj2 ) · jj¸(n) ¡ ¸¤ jj2 ¡ 2²n (¸(n) ¡ ¸¤ )T L¸ (~
x(n); p(n); ¸(n))
+O(²n jj¯(n)jj) + O(²2 ):
n
(Cont'd)
By the de¯nition of V (¢; ¢), we have that
£jj
¤
¤ jj2
¤ jj2
·
¡
jj
¡
En [V (~
x(n + 1); ¸)(n + 1)]
En x
~(n + 1) x
~
+ ¸(n + 1) ¸
¢
¤ jj2
¤ jj2
· ¡jjx
¡
jj
¡
~(n) x
~
+ ¸(n) ¸
+ 2² G(~
x(n); ¸
+O (²n (jj®(n)jj + jj¯(n)jj) + O(²2 ):
n
n
Note that the above inequality is true for all ¸¤ 2 ©, and therefore,
En [V (~
x(n + 1); ¸(n + 1))] · V (~
x(n); ¸(n)) + 2²n G(~
x(n); ¸(n))
+O (²n (jj®(n)jj + jj¯(n)jj)) + O(²2 );
n
with the understanding that
G(~
x(n); ¸(n)) = (~
x(n) ¡ x
~¤ )T Lx~ (~
x(n); p(n); ¸(n))
¡(¸(n) ¡ ¸¤ )T L (~
x(n); p(n); ¸(n));
¸
min
where ¸¤
min
= arg min
¸2©
jj¸(n) ¡ ¸jj2 .
A Supermartingale Lemma
Let fXn g be an Rr -valued stochastic process, and V (¢) be a real-valued
and non-negative functionPon Rr . Suppose that fYn g is a sequence of random
jY j < 1 with probability one. Let fF g be a
variables satisfying that
n
n
n
¡
f
·
g
sequence of ¾ algebras, generated by Xi ; Yi ; i n . Suppose that there exists
a compact set A ½ Rr such that for all n
En [V (Xn+1 )] ¡ V (Xn ) · ¡²n ¾ + Yn ; for Xn 2
= A;
where ²n satis¯es A1 and ¾ is a positive constant. Then the set A is recurrent
for fXn g, i.e., Xn 2 A for in¯nitely many n with probability one.
Recurrence of
A¹
It su±ces to show that G(~
x(n); ¸(n)) < 0
when (~
x(n); ¸(n)) 2 Ac .
¹
We upper-bound G(¢; ¢) using the following facts.
² Since the Lagrangian function is concave in x
~, it follows that
L(~
x(n); p(n); ¸(n)) ¡ L(~
x¤ ; p(n); ¸(n)) ¸ (~
x(n) ¡ x
~¤ )0 Lx~ (~
x(n); p(n); ¸(n)):
² Since L(~
x; p; ¸) is linear in ¸, we have that
L(~
x(n); p(n); ¸¤ ) ¡ L(~
x(n); p(n); ¸(n)) = (¸¤ ¡ ¸(n))0 L¸ (~
x(n); p(n); ¸(n)):
(Cont’d)
Therefore,
G(~
x(n); ¸(n)) · L(~
x(n); p(n); ¸¤ ) ¡ L(~
x¤ ; p(n); ¸(n))
= L(~
x(n); p(n); ¸¤ ) ¡ L(~
x¤ ; p¤ ; ¸¤ )
+L(~
x¤ ; p¤ ; ¸¤ ) ¡ L(~
x¤ ; p¤ ; ¸(n))
+L(~
x¤ ; p¤ ; ¸(n)) ¡ L(~
x¤ ; p(n); ¸(n)):
Observe that
² The ¯rst two terms are negative due to the saddle point property of
(~
x¤ ; p¤ ; ¸¤ ), i.e., L(~
x(n); p(n); ¸¤ ) · L(~
x¤ ; p¤ ; ¸¤ ) · L(~
x¤ ; p¤ ; ¸(n)).
² The last term is negative because when (~
x¤ ; ¸(n)) is ¯xed, p(n) is the
unique maximum for L(¢; ¢; ¢).
Conclusion: There exists ±¹ > 0 such that G(fx
~(n); ¸(n)g) < ¡±¹ when
(~
x(n); ¸(n)) 2 Ac . Therefore, A¹ is recurrent a.s.
¹
Step II – Local Analysis
² We next show that f(~
x(n); ¸(n); n = 1; 2; : : :g leaves A3¹ only ¯nitely often
with probability one.
² Let fn ; k = 1; 2 : : :g denote the recurrent times such that (~
2
x
(n
);
¸(n
))
k
k
k
A¹ .
² Thus, it su±ces to show that there exists n 2 fn ; k = 1; 2 : : :g, such
k0
k
¸
f
that for all n nk , the original iterates (~
x(n); ¸(n)); n = 1; 2; : : :g reside
0
in A3¹ almost surely.
One Step Ahead
Recall that
·
¸
·
¸
·
¸
Lx~ (~
x(n); p(n); ¸(n))
x
~s (n + 1)
x
~s (n)
=
+ ²n ¡ s
L¸ (~
x(n); p(n); ¸(n))
¸(i;j) (n + 1)
¸(i;j) (n)
(i;j)
·
¸
·
¸
x
~
Z s
®s (n) + ³s (n)
n
+²n
+ ²n
:
¸
¯(i;j) (n) + »(i;j) (n)
Z (i;j)
n
Basic idea: Show that these three terms diminishes asymptotically!
Rate of Convergence
² The rate of convergence is concerned with the asymptotic behavior of
normalized errors about the optimal points.
² As is standard, assume that the iterates generated by the stochastic primaldual algorithm have entered in a small neighborhood of an optimal solution
(~
x¤ ; ¸¤ ).
p
p
² De¯ne U (n) , (~
¡
¡
¤
¤
x(n) x
~ )= ²n and U¸ (n) , (¸(n) ¸ )= ²n .
x
~
² Construct U n (t) to be the piecewise constant interpolation of U (n) =
fU (n); U (n)g, i.e., U n (t) = U , for t 2 [t
¡ t ;t
¡ t ), where
x
~ P ¸
n+i
n+i
n n+i+1
n
¡1
n
tn ,
² .
i=0 n
Un+2
Un (t) U
n
Un+1
²n ²n+1 ²n+2
t
Assumptions
A5. Let µ(n) , (~
x(n); ¸(n)) and Án , (³(n); »(n)). Suppose for any given
small ½ > 0, there exists a positive de¯nite symmetric matrix § = ¾¾ 0 such that
En [Án ÁT ¡ §]I fjµ(n) ¡ µ¤ j · ½g ! 0
n
as n ! 1.
De¯ne
A,
·
(~
x¤ ; p¤ ; ¸¤ )
Lx~x~
L¸~x
¡L (~
¤ ; p¤ ; ¸¤ )
x
¸~
x
(~
x¤ ; p¤ ; ¸¤ )
0
¸
:
(1)
A6. Let ²n = 1=n; and assume A + I=2 is a Hurwitz matrix.
Note that it can be shown that the real parts of the eigenvalues of A are all
non-positive [Bertsekas, 99].
Main Result on Rate of Convergence
a) Under Conditions A1 and A3 ¡ A6, U n (¢) converges weakly to the solution (denoted as U ) to the Skorohod problem
µ
¶ µ
¶µ
¶
µ
¶
I
dUx~
Ux~
dzx~
= A+
dt + ¾dw(t) +
;
dU¸
U
dz
2
¸
¸
b) If (~
x¤ ; ¸¤ ) is an interior point in the constraint set, then the limiting
process U is a stationary Gaussian di®usion process, and U (n) converges in
distribution to a normally distributed random variable with mean zero and
covariance §.
c) If (~
x¤ ; ¸¤ ) is on the boundary of the constraint set, then the limiting
process U is a stationary re°ected linear di®usion process.
Remarks and Engineering Insights
1) In general, the limit process is a stationary re°ected linear di®usion process, not necessarily the standard Gaussian di®usion process.
2) The limit process would be Gaussian if there is no re°ection term. For
instance, when all the link constraints in Problem P are active at the optimal
point.
¡ 3) The
¢ rate of convergence depends heavily on the smallest eigenvalue of
A + I . The more negative the smallest eigenvalue is, the faster the rate of
2
convergence would be.
4) Intuitively speaking, the re°ection terms would help increase the speed of
convergence, which unfortunately cannot be characterized exactly.
5) The covariance matrix of the limit process gives a measure of the spread
at the equilibrium point, and is typically \smaller" than the unconstrained case.
Conclusions
1) We have studied joint °ow control and MAC design in multi-hop wireless
networks with random access; and formulate rate control as a network utility
maximization problem.
2) We use the Lagrangian dual decomposition method to devise a distributed
primal-dual algorithm that provides an exact solution.
3) We have shown that the proposed primal-dual algorithm converges almost
surely to the optimal solutions provided that the estimator is asymptotically
unbiased.
4) Our ¯ndings on the rate of convergence reveal that in general the limit
process of the interpolated process, corresponding to the normalized iterate
sequence generated from the primal-dual algorithm, is a re°ected linear di®usion
process, not necessarily the Gaussian di®usion process.
Download