The Role of Control Information in Wireless Link Scheduling Matthew R. Johnston

advertisement
The Role of Control Information in Wireless Link
Scheduling
by
Matthew R. Johnston
B.S. (EECS), University of California, Berkeley (2008)
S.M. (EECS), Massachusetts Institute of Technology (2010)
Submitted to the Department of Electrical Engineering and Computer
Science
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
February 2015
c Massachusetts Institute of Technology 2015. All rights reserved.
Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Department of Electrical Engineering and Computer Science
December 18, 2014
Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Eytan H. Modiano
Professor
Thesis Supervisor
Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Leslie A. Kolodziejski
Chairman, Department Committee on Graduate Theses
2
The Role of Control Information in Wireless Link Scheduling
by
Matthew R. Johnston
Submitted to the Department of Electrical Engineering and Computer Science
on December 18, 2014, in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy
Abstract
In wireless networks, transmissions must be scheduled to opportunistically exploit
the time-varying capacity of the wireless channels to achieve maximum throughput.
These opportunistic policies require global knowledge of the current network state
to schedule transmissions efficiently; however, providing a controller with complete
channel state information (CSI) requires significant bandwidth. In this thesis, we
investigate the impact of control information on the ability to effectively schedule
transmissions. In particular, we study the tradeoff between the availability and accuracy of CSI at the scheduler and the attainable throughput. Moreover, we investigate
strategies for controlling the network with limited CSI.
In the first half of the thesis, we consider a multi-channel communication system
in which the transmitter chooses one of M channels over which to transmit. We model
the channel state using an ON/OFF Markov process. First, we consider channel probing policies, in which the transmitter probes a channel to learn its state, and uses
the CSI obtained from channel probes to make a scheduling decision. We investigate
the optimal channel probing strategies and characterize the tradeoff between probing
frequency and throughput. Furthermore, we characterize a fundamental limit on the
rate at which CSI must be conveyed to the transmitter in order to meet a constraint
on expected throughput. In particular, we develop a novel formulation of the opportunistic scheduling problem as a causal rate distortion optimization of a Markov
source.
The second half of this thesis considers scheduling policies under delayed CSI, resulting from the transmission and propagation delays inherent in conveying CSI across
the network. By accounting for these delays as they relate to the network topology,
we revisit the comparison between centralized and distributed scheduling, showing
that there exist conditions under which distributed scheduling outperforms the optimal centralized policy. Additionally, we illustrate that the location of a centralized
controller impacts the achievable throughput. We propose a dynamic controller placement framework, in which the controller is repositioned using delayed queue length
information (QLI). We characterize the throughput region under all such policies, and
propose a throughput-optimal joint controller placement and scheduling policy using
3
delayed CSI and QLI.
Thesis Supervisor: Eytan H. Modiano
Title: Professor
4
Acknowledgments
This thesis represents the culmination of a lot of work and effort, the majority of
which would not have been possible if not for the advice, guidance and support of
many people. First and foremost, I would like to extend my sincerest gratitude to my
advisor, Eytan Modiano. From day one, six and a half years ago, Eytan has helped
guide my research, and his guidance has shaped me into the researcher and person
that I am today. I cannot thank him enough.
I wish to thank Professor Yury Polyanskiy, who collaborated with me to develop a
significant portion of this thesis. Yury’s endless enthusiasm was refreshing, and drove
me to get the most out of my research. I would also like to thank Professor John
Tsitsiklis, who’s teaching helped laid the foundation for my thesis, and who provided
essential guidance for my research at key points throughout this process. Additionally,
much of this research was a product of collaborations with several people. I would
like to thank my coauthors Prof. Hyang-Won Lee and Prof. Isaac Kesslasy, with
whom I had many technical discussions that helped progress my research.
I am very thankful for the people at CNRG who have shared an office with me for
one point or another during my time at MIT. A special thanks to Greg Kuperman,
Sebastian Neumayer, Guner Celik, Marzieh Parandehgheibi, and Georgios Paschos,
who have spent many hours staring at a white board with me, working out problem
formulations and proofs. A huge thanks to them and the other members of CNRG
for providing a great environment to come to every day.
My time at MIT was so enjoyable because of the friends I’ve made along the way.
Thank you to all of them, who have supported me through the toughest times, and
helped me celebrate the best. Even though grad school apparently does come to an
end, I know the friendships I made here will last forever.
None of this would have been possible at all if not for the support and love I
received along the way. I am eternally grateful for the love of my family: my parents,
Leslie and Tom, and my sisters Megan and Rachel. Each one of them has shaped
me into the person I am, and has pushed me and supported me through my entire
5
academic career. One acknowledgment section could never capture how much you’ve
meant to me. Lastly, thank you to Marta Flory. Your never-ending support and love
helped me get through the toughest parts of this journey. Having you to talk to at
the end of every single day has been so important to me, I couldn’t imagine doing
this without you.
This work was supported through the National Science Foundation, through grants
CNS-0915988 and CNS-1217048, through the Army Research Office Multidisciplinary
University Research Initiative through grant W911NF-08-1-0238, and the Office of
Naval Research through grant N00014-12-1-0064.
6
Contents
1 Introduction
1.1
19
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
1.1.1
Network Control . . . . . . . . . . . . . . . . . . . . . . . . .
21
1.1.2
Channel Probing . . . . . . . . . . . . . . . . . . . . . . . . .
22
1.1.3
Protocol Information . . . . . . . . . . . . . . . . . . . . . . .
22
1.1.4
Scheduling with Delayed CSI . . . . . . . . . . . . . . . . . .
23
Our Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
1.2.1
Channel Probing . . . . . . . . . . . . . . . . . . . . . . . . .
24
1.2.2
Fundamental Limit on CSI Overhead . . . . . . . . . . . . . .
25
1.2.3
Delayed Channel State Information . . . . . . . . . . . . . . .
26
1.2.4
Throughput Optimal Scheduling with Hidden CSI . . . . . . .
27
1.3
Modeling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . .
28
1.4
Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
1.2
2 Channel Probing in Opportunistic Communication Systems
2.1
2.2
31
System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
2.1.1
Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
2.1.2
Optimal Scheduling . . . . . . . . . . . . . . . . . . . . . . . .
35
Two-Channel System . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
2.2.1
Heterogeneous Channels . . . . . . . . . . . . . . . . . . . . .
41
2.2.2
Simulation Results . . . . . . . . . . . . . . . . . . . . . . . .
43
7
2.3
Optimal Channel Probing over Finitely Many Channels . . . . . . . .
45
2.3.1
Three Channel System . . . . . . . . . . . . . . . . . . . . . .
46
2.3.2
Arbitrary Number of Channels
. . . . . . . . . . . . . . . . .
48
2.3.3
Simulation Results . . . . . . . . . . . . . . . . . . . . . . . .
48
Infinite-Channel System . . . . . . . . . . . . . . . . . . . . . . . . .
49
2.4.1
Probe-Best Policy . . . . . . . . . . . . . . . . . . . . . . . . .
50
2.4.2
Probe Second-Best Policy . . . . . . . . . . . . . . . . . . . .
52
2.4.3
Round Robin Policy . . . . . . . . . . . . . . . . . . . . . . .
57
2.4.4
Simulation Results . . . . . . . . . . . . . . . . . . . . . . . .
59
Dynamic Optimization of Probing Intervals . . . . . . . . . . . . . . .
59
2.5.1
Two-Channel System . . . . . . . . . . . . . . . . . . . . . . .
60
2.5.2
State Action Frequency Formulation . . . . . . . . . . . . . .
66
2.5.3
Infinite-Channel System . . . . . . . . . . . . . . . . . . . . .
70
2.6
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
77
2.7
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
2.7.1
Proof of Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . .
78
2.7.2
Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . .
79
2.7.3
Proof of Theorem 3 . . . . . . . . . . . . . . . . . . . . . . . .
80
2.7.4
Proof of Lemmas 2 and 3 . . . . . . . . . . . . . . . . . . . . .
84
2.4
2.5
3 Opportunistic Scheduling with Limited Channel State Information:
A Rate Distortion Approach
87
3.1
System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
3.1.1
Problem Formulation . . . . . . . . . . . . . . . . . . . . . . .
91
3.1.2
Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . .
92
Rate Distortion Lower Bound . . . . . . . . . . . . . . . . . . . . . .
93
3.2.1
Traditional Rate Distortion . . . . . . . . . . . . . . . . . . .
93
3.2.2
Causal Rate Distortion for Opportunistic Scheduling . . . . .
94
3.2.3
Analytical Solution for Two-Channel System . . . . . . . . . .
96
Heuristic Upper Bound . . . . . . . . . . . . . . . . . . . . . . . . . .
97
3.2
3.3
8
3.3.1
Minimum Distortion Encoding Algorithm . . . . . . . . . . . .
98
3.3.2
Threshold-based Encoding Algorithm . . . . . . . . . . . . . .
99
3.4
Causal Rate Distortion Gap . . . . . . . . . . . . . . . . . . . . . . . 101
3.5
Application to Channel Probing . . . . . . . . . . . . . . . . . . . . . 104
3.6
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.7
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.7.1
Proof of Theorem 14 . . . . . . . . . . . . . . . . . . . . . . . 108
3.7.2
Proof of Theorem 15 . . . . . . . . . . . . . . . . . . . . . . . 111
3.7.3
Proof of Proposition 4 . . . . . . . . . . . . . . . . . . . . . . 116
4 Centralized vs. Distributed: Wireless Scheduling with Delayed CSI119
4.1
Model and Problem Formulation . . . . . . . . . . . . . . . . . . . . . 121
4.1.1
Delayed Channel State Information . . . . . . . . . . . . . . . 122
4.1.2
Scheduling Disciplines . . . . . . . . . . . . . . . . . . . . . . 123
4.2
Centralized vs. Distributed Scheduling . . . . . . . . . . . . . . . . . 127
4.3
Tree Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
4.4
4.3.1
Distributed Scheduling on Tree Networks . . . . . . . . . . . . 137
4.3.2
On Distributed Optimality . . . . . . . . . . . . . . . . . . . . 139
4.3.3
Centralized Scheduling on Tree Topologies . . . . . . . . . . . 140
Clique Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
4.4.1
Centralized Scheduling . . . . . . . . . . . . . . . . . . . . . . 149
4.4.2
Distributed Scheduling . . . . . . . . . . . . . . . . . . . . . . 149
4.4.3
Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
4.5
Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
4.6
Partially Distributed Scheduling . . . . . . . . . . . . . . . . . . . . . 154
4.7
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
5 Controller Placement for Maximum Throughput
5.1
159
Static Controller Placement . . . . . . . . . . . . . . . . . . . . . . . 160
5.1.1
System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
5.1.2
Controller Placement Example . . . . . . . . . . . . . . . . . . 161
9
5.2
5.3
5.1.3
Optimal Controller Placement . . . . . . . . . . . . . . . . . . 163
5.1.4
Effect of Controller Placement . . . . . . . . . . . . . . . . . . 164
5.1.5
Controller Placement Heuristic . . . . . . . . . . . . . . . . . 165
5.1.6
Multiple Controllers . . . . . . . . . . . . . . . . . . . . . . . 167
Dynamic Controller Placement . . . . . . . . . . . . . . . . . . . . . . 170
5.2.1
Two-Node Example . . . . . . . . . . . . . . . . . . . . . . . . 174
5.2.2
Queue Length-based Dynamic Controller Placement . . . . . . 176
5.2.3
Controller Placement With Global Delayed CSI . . . . . . . . 185
Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
5.3.1
Infrequent Controller Relocation . . . . . . . . . . . . . . . . . 191
5.4
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
5.5
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
5.5.1
Proof of Theorem 21 . . . . . . . . . . . . . . . . . . . . . . . 195
5.5.2
Proof of Corollary 5 . . . . . . . . . . . . . . . . . . . . . . . 206
5.5.3
Proof of Lemma 16 . . . . . . . . . . . . . . . . . . . . . . . . 209
5.5.4
Proof of Theorem 23 . . . . . . . . . . . . . . . . . . . . . . . 212
6 Scheduling over Time Varying Channels with Hidden State Information
221
6.1
System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
6.2
Throughput Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
6.3
Dynamic QLI-Based Scheduling Policy . . . . . . . . . . . . . . . . . 227
6.4
Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
6.5
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
6.6
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
6.6.1
Proof of Theorem 26 . . . . . . . . . . . . . . . . . . . . . . . 231
7 Concluding Remarks
239
10
List of Figures
1-1 Example wireless network . . . . . . . . . . . . . . . . . . . . . . . .
27
1-2 Markov Chain describing the channel state evolution of each independent channel. State 0 corresponds to an OFF channel, while state 1
corresponds to an ON channel. . . . . . . . . . . . . . . . . . . . . . .
28
2-1 System model: transmitter and receiver connected through M independent channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
2-2 Markov Chain describing the channel state evolution of each independent channel. State 0 corresponds to an OFF channel, while state 1
corresponds to an ON channel. . . . . . . . . . . . . . . . . . . . . . .
34
2-3 Optimal fixed probing interval for a two channel system as a function
of state transition probability p = q. In this example, c = 0.5. . . . .
39
2-4 Throughput under the optimal fixed-interval probing policy for a twochannel system as a function of the state transition probability p = q
. In this example, c = 0.5. . . . . . . . . . . . . . . . . . . . . . . . .
40
2-5 Two asymmetric Markov Chains, where 1−p1 −q1 ≥ 0, and 1−p2 −q2 ≥ 0. 41
2-6 Throughput of ’Probe Channel 1’ policy and ’Probe Channel 2’ policy.
In this example, p1 is varied from 0 to 12 , and q1 is chosen so π = 34 .
The second channel satisfies p2 =
1
4
11
and q2 =
1
,
12
resulting in π2 = π1 .
45
2-7 Comparison of the probe best policy, the probe second-best policy, and
the probe third best policy as a function of the number of channels in
the system. This simulation was run over 2 million probes, with each
probe being at an interval of 4 time slots.
. . . . . . . . . . . . . . .
49
2-8 Illustration of renewal process. Points represent probing instances, and
labels represent probing results. Each renewal interval consists of phase
1, and phase 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
2-9 Comparison of the probe best policy and the probe second-best policy
for varying probing intervals k. In this example, p = q = 0.05. . . . .
56
2-10 Comparison of the probe best policy and the probe second-best policy
for varying state transition probabilities p = q. In this example, k = 1.
57
2-11 Optimal decisions based on SAFs. White space corresponds to transient states under the optimal policy, and green circles, red boxes, and
blue stars correspond to recurrent states where the optimal action is
to not probe, probe channel 1, and probe channel 2 respectively.
. .
69
2-12 Comparison of the expected throughput of the probe best policy and
the round robin policy under fixed intervals and under dynamic intervals. The x-axis plots k, the length of the interval. The maxima of each
graph represents the optimal policy in each regime. In this example,
p = q = 0.05 and c = 0.5. . . . . . . . . . . . . . . . . . . . . . . . . .
76
2-13 Comparison of the probe best policy and round robin for varying values
of k, the minimum interval between probes. In this example, p = q =
0.1, and c = 0.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
77
3-1 System Model: A transmitter and receiver connected by M independent channels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
3-2 Markov chain describing the channel state evolution of each independent channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
90
3-3 Information structure of an opportunistic communication system. The
receiver measures the channel state X(t), encodes this into a sequence
Z(t), and transmits this sequence to the transmitter. . . . . . . . . .
90
3-4 Causal information rate distortion function for different state transition
probabilities p for a two channel opportunistic scheduling system. . .
97
3-5 Definition of K, the time since the last change in the sequence Z(t),
with respect to the values of Z(t) up to time t. . . . . . . . . . . . . .
98
3-6 The causal information rate distortion function Rc (D) (Section 3.2)
and the upper bound to the rate distortion function (Section 3.3), computed using Monte Carlo Simulation. Transition probabilities satisfy
p = q = 0.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3-7 Rate distortion functions for example systems. . . . . . . . . . . . . . 103
3-8 Causal information rate distortion lower bound, heuristic upper bound,
and probing algorithmic upper bound for a two channel system of p =
q = 0.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4-1 Feasible link activation under primary link interference. Bold edges
represent activated links.
. . . . . . . . . . . . . . . . . . . . . . . . 122
4-2 Markov Chain describing the channel state evolution of each independent channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4-3 Delayed CSI structure for centralized scheduling. Controller (denoted
by crown) has full CSI of red bold links, one-hop delayed CSI of green
dashed links, and two-hop delayed CSI of blue dotted links. . . . . . . 123
4-4 Example network: All links are labeled by their channel state at the
current time. Bold links represent activated links. . . . . . . . . . . . 126
4-5 Four-node ring topology. . . . . . . . . . . . . . . . . . . . . . . . . . 127
4-6 Expected sum-rate throughput for centralized and distributed scheduling algorithms over four-node ring topology, as a function of channel
transition probability p. . . . . . . . . . . . . . . . . . . . . . . . . . 129
13
4-7 Example of combining matchings to generate components. Red links
and blue links correspond to maximum cardinality matchings M0 and
Mi . The component containing node i is referred to as path Pi . . . . 131
4-8 Abstract representation of a node n’s position on multiple conflicting
paths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4-9 Recursive distributed scheduling over binary trees.
. . . . . . . . . . 137
4-10 Example Matchings. If link l is required to be in the matching, there
exists a new maximal matching including l.
. . . . . . . . . . . . . . 139
4-11 Threshold value of p∗ (k) such that for p > p∗ (k), distributed scheduling
outperforms centralized scheduling on 2-level, k-ary tree. . . . . . . . 143
4-12 Possible scheduling scenarios for centralized scheduler.
. . . . . . . . 145
4-13 Threshold value of p∗ (n) such that for p > p∗ (n), distributed scheduling
outperforms centralized scheduling on n-level, binary tree. . . . . . . 147
4-14 A six-node sample network . . . . . . . . . . . . . . . . . . . . . . . . 152
4-15 Results for the six node network in Figure 4-14, over a horizon of
100,000 time slots. The plot shows the fraction of the perfect-CSI
throughput obtained as a function of p, the transition probability of
the channel state.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
4-16 A 5x5 grid network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
4-17 Results for a 5 x 5 grid network, over a horizon of 100,000 time slots.
The plot shows the fraction of the perfect-CSI throughput obtained as
a function of p, the transition probability of the channel state. . . . . 153
4-18 Results for 10-node clique topology, over a horizon of 100,000 time
slots. The plot shows the fraction of the perfect-CSI throughput obtained as a function of p, the transition probability of the channel state. 154
4-19 Example subtrees from tree-partitioning algorithm . . . . . . . . . . . 155
4-20 Example partitioning of infinite tree (only first four level’s shown).
Dashed links, dotted links, and solid links each belong to different
subtrees. The solid nodes represent controllers, which are located at
the root of each subtree. Nodes labeled with B are border nodes. . . 155
14
4-21 Illustration of border link labeling scheme . . . . . . . . . . . . . . . 156
4-22 Per-link throughput of the tree partitioning scheme, plotted as a function of transition probability p for various subtree depths. . . . . . . . 157
5-1 Markov Chain describing the channel state evolution of each independent channel. State 0 corresponds to an OFF channel, while state 1
corresponds to an ON channel. . . . . . . . . . . . . . . . . . . . . . . 160
5-2 Barbell Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5-3 (Snowflake Network) Symmetric network in which node A has degree
k1 and node B has degree k2 + 1 . . . . . . . . . . . . . . . . . . . . . 164
5-4 Sum-rate throughput resulting from having controller at three possible
node locations, with k1 = 4 and k2 = 20, as a function of channel
transition probability p = q. . . . . . . . . . . . . . . . . . . . . . . . 165
5-5 Evaluation of the controller placement heuristic for the barbell network
and various channel transition probabilities p = q. . . . . . . . . . . . 167
5-6 14 Node NSFNET backbone network (1991) . . . . . . . . . . . . . . 168
5-7 Random geometric graph with multiple controllers placed using the
myopic placement algorithm, followed by the controller exchange algorithm. Link colors correspond to distance from the nearest controller.
171
5-8 Wireless Downlink . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
5-9 Example 2-node system model. . . . . . . . . . . . . . . . . . . . . . 174
5-10 Throughput regions for different controller scenarios. Assume the channel state model satisfies p = 0.1, q = 0.1, and d1 (2) = d2 (1) = 1. . . . 176
5-11 Example star network topology where each node measures its own
channel state instantaneously, and has d-step delayed CSI of each other
node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
5-12 Simulation results for different controller placement policies, with channel model parameters p = 0.1, q = 0.1.
. . . . . . . . . . . . . . . . . 189
5-13 Effect of QLI-delay on system stability, for p = q = 0.1. Each curve
corresponds to a different value of τQ .
15
. . . . . . . . . . . . . . . . . 190
5-14 Two-level binary tree topology. . . . . . . . . . . . . . . . . . . . . . 190
5-15 Results for different controller placement policies on tree network in
Figure 5-14: DPCS Policy with τQ = 150, equal time-sharing, and
fixed controller at node 3. Simulation ran for 40,000 time slots with
p = q = 0.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
5-16 Fraction of time each node is selected as the controller under DCPS
for the topology in Figure 5-14. Blue bars correspond to system with
p = q = 0.1, and red bars correspond to system with p = q = 0.3.
. . 192
6-1 Wireless Downlink . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
6-2 Markov Chain describing the channel state evolution of each independent channel. State 0 corresponds to an OFF channel, while state 1
corresponds to an ON channel. . . . . . . . . . . . . . . . . . . . . . . 224
6-3 Symmetric arrival rate versus average queue backlog for a 4-queue
system under different DLQ policies. Transition probabilities satisfy
p = q = 0.01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
6-4 Symmetric arrival rate versus average queue backlog for a 4-queue
system under different DLQ policies. Transition probabilities satisfy
p = q = 0.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
16
List of Tables
2.1
Comparison of different probing policies for a two-channel system for
a fixed probing interval (6) and time horizon 2,000,000. . . . . . . . .
2.2
Comparison of different probing policies for a fixed probing interval (6)
and time horizon 2,000,000. State transition probability p = q = 0.05
2.3
49
Example renewal interval starting at time 0 and renewing at time 6k.
At each probing interval, the second-best channel is probed. . . . . .
2.4
44
55
Throughput comparison for different probing policies with p = q =
0.05, k = 6. Simulation assumes 500 channels and a time horizon of
1,000,000 probes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1
59
Results of controller placement problem over the NSFNET topology.
Optimal placement is computed by solving (5.7) via brute force, while
heuristic refers to (5.8). . . . . . . . . . . . . . . . . . . . . . . . . . . 168
5.2
Maximum weight for different controller placement algorithms over
random geometric graphs. . . . . . . . . . . . . . . . . . . . . . . . . 172
6.1
Throughput optimal policies for different system models. Column corresponds to a different amount of information at the controller. Rows
corresponds to the memory in the channel. S(t) is the channel state
at the current slot, and Q(t) is the queue backlog.
17
. . . . . . . . . . 223
18
Chapter
1
Introduction
Wireless networks have emerged as a fundamental communication architecture in today’s society. The growing popularity of mobile data networks has led to an increased
demand for many wireless applications. Today, the use of wireless networks has expanded to include cellular networks, infrastructure-less peer-to-peer mobile ad hoc
networks (MANETs), wireless sensor networks, city-wide mesh networks for broadband internet access, and more. Meanwhile, mobile data is expected to increase by
an order of magnitude within the next five years [14]. As demand for wireless networks grows, advanced network control schemes must be developed to fully utilize
the available capacity of these networks.
Wireless networks introduce several challenges beyond those of their wired counterparts. First, wireless communication occurs over a shared medium, such that each
transmission is heard by every receiver in the neighborhood of the transmitter. As
a result, simultaneous transmissions interfere with one another, causing a significant
degradation in throughput. Consequently, transmissions must be scheduled to mitigate this interference. Secondly, the wireless channel has a time-varying capacity.
This phenomenon is referred to as fading, and arises on multiple time-scales due to
the mobility of users, shadowing from large objects in the environment, and constructive and destructive interference caused by waveforms traversing multiple paths
from source to destination [65]. Opportunistically transmitting over high-capacity
channels, while avoiding low-quality channels, maximizes throughput over the net19
work. In order to address these challenges, research has focused on the development
of scheduling algorithms that control transmissions to maximize throughput.
The performance of wireless scheduling policies depends on the availability of
network state information (NSI) at the controller. For example, opportunistically
scheduling transmissions to exploit the time-varying nature of wireless channels necessitates current channel state information (CSI) [38]. Modern wireless technologies,
such as LTE [59] and WiMax [4], allow for the measurement of channel qualities on
the order of a few milliseconds, and this information can be fed back to the scheduler for use in decision-making. Additionally, as traffic demands fluctuate over time,
knowledge of queue length information (QLI) is used to ensure the stable operation
of the network [61].
While NSI can be used to improve system performance, significant bandwidth is
required to supply NSI to the scheduler, reducing the available capacity for communication. In the current cellular 4G standard, the LTE uplink is designed to have a
30% overhead [59]. Furthermore, overheads in military networks have been shown to
grow as large as 99% of all packet transmissions [3]. As networks continue to grow
and additional wireless applications arise, current networks will not have sufficient
capacity to constantly acquire full NSI. Cisco predicts that the number of mobile devices connected to the internet will increase by 50%, and the amount of mobile data
will increase 11-fold by 2018 [14]. Therefore, it is increasingly important to study
techniques for communication with reduced NSI overheads.
In this thesis, we investigate the impact of NSI on the ability to effectively control
the network. In particular, we study the tradeoff between the amount of CSI and
QLI available to the network controller, and throughput attainable by scheduling
algorithms utilizing this information. Moreover, we analyze wireless scheduling in
scenarios where complete NSI is unavailable, either because it is only obtained from
part of the network, it is obtained infrequently, or it is delayed as it is provided to the
controller. We study the effect of this reduced information on the ability to control
the network, and develop scheduling schemes based on imperfect NSI.
20
1.1
Related Work
Several previous works have studied the relationship between the performance of
network control tasks, and the required information at the controller. These works
present different formulations for the wireless scheduling problem. This section elaborates on each of these different formulations.
1.1.1
Network Control
In the past several decades, there has been much work studying the optimal control of
wireless networks [5,10,29,33,41,46–48,50,61,62,69]. The area of throughput-optimal
scheduling in wireless networks was pioneered by Tassiulas and Ephremides in [61,62],
and later extended in [48]. These works show that the throughput-optimal schedule
is given by the max-weight policy, where link-weights are computed as the product of
the packet backlog and the current transmission rate over the link. This framework
has been extended to other forms of network control, such as routing, congestion
control, and quality of service (QoS) utility optimization. See [50] for an overview.
In most of these works, current QLI and CSI is assumed to be globally available, and
the performance of these algorithms depends on the accuracy of this information.
While global NSI is essential for optimal centralized scheduling, acquiring networkwide CSI and QLI is impractical. A possible solution is to use distributed scheduling
policies, which only require local NSI, but compute local rather than global optima,
leading to a throughput reduction [46]. Greedy Maximal Scheduling (GMS) was proposed as a low complexity distributed policy, and has been shown to achieve a fraction
of the throughput achievable by a centralized scheme, depending on the topology of
the network [17, 39, 44, 73]. In this approach, the maximum weight transmissions are
added to the schedule if they do not interfere with previously scheduled transmissions.
Distributed scheduling schemes that approach the centralized throughput region are
proposed in [47, 57], but require higher complexity than their greedy counterparts.
Additionally, several authors have applied random-access approaches to maximize
throughput in a fully distributed manner [35, 40, 53, 56]. However, NSI is required to
21
determine the correct transmission probabilities for these schemes.
1.1.2
Channel Probing
One strategy to obtain local CSI is to explicitly probe channels to learn the current channel state. This is particularly relevant when there are multiple channels
over which to communicate, and a transmitter seeks the channel yielding the highest
throughput. Several works have studied channel probing in multichannel communication settings [2, 11, 12, 23, 27, 28, 31, 71]. Of particular interest is the work in [2]
and [71], in which the authors assume that after a channel is probed, the transmitter
must transmit over that channel. They show that the optimal probing policy is a
myopic policy, which probes the channel with the highest expected transmission rate.
This model is also considered in [31], which characterizes the achievable capacity region as the limit of a sequence of linear programs in terms of state action frequencies
with increasingly large state spaces. The works in [11, 12, 23, 27, 28] consider a model
where the channel state is independent over time; thus, probing a channel in the current slot yields no information about that channel in the future. Furthermore, these
works allow for multiple channel probes per time slot. In [11, 12, 27, 28], probes occur
sequentially, and the transmitter determines when to stop probing and either use one
of the probed channels, or guess the state of an un-probed channel. In [23], all the
channel probes occur simultaneously, and the objective is to determine the subset of
channels to probe.
1.1.3
Protocol Information
An independent branch of research has applied tools from information theory to
characterize the NSI overheads required for various network control tasks. Among
the earliest works to do so is Gallager’s seminal paper [19], where fundamental lower
bounds on the amount of overhead, referred to as protocol information, needed to keep
track of source and destination addresses, and message starting and stopping times,
are derived using rate-distortion theory. Since Gallager’s paper, other researchers
22
have also considered information theoretic approaches to study protocol overheads
in simple network settings. A discrete-time analog of Gallager’s model is considered
in [18], where a rate distortion framework is used to characterize timing overhead
in a slotted system. In [1], the authors use a rate distortion framework to calculate
the minimum rate at which node location and neighborhood information must be
transmitted in a mobile network, and suggest the corresponding impact on network
capacity. Additionally, the work in [30] considers an information theoretic framework
to study a simple scheduling problem in a wireless network. These works consider
quantifying time-independent NSI. However, these approaches do not apply to scenarios in which the network state process has memory, such as opportunistic wireless
scheduling.
1.1.4
Scheduling with Delayed CSI
As discussed previously, acquiring up-to-date CSI and QLI from across the network
may be unrealistic, especially for large networks. This motivates several works on
throughput optimal scheduling under delayed NSI. In [41], the authors consider a
time-slotted system, in which CSI and QLI updates are only reported once every
T slots, but the transmitter makes a scheduling decision every slot, using delayed
information. They show that delays in the CSI reduce the achievable throughput
region, while delays in QLI do not adversely affect throughput. In [69], Ying and
Shakkottai study throughput optimal scheduling and routing with delayed CSI and
QLI. They show that the throughput optimal policy activates a max-weight schedule,
where the weight on each link is given by the product of the delayed queue length
and the conditional expected channel state given the delayed CSI. Additionally, they
propose a threshold-based distributed policy which is shown to be throughput optimal
(among a class of distributed policies). This work is extended in [70], where the
authors account for the uncertainty in the state of the network topology as well.
Lastly, the work in [54] characterizes the stability region of the network when an
estimate of the channel state is available to the transmitter, rather than the true
channel state. The throughput optimal policy in this case is a max-weight type
23
policy, where the weight is a conditional expected channel state given the estimate.
1.2
Our Contributions
In this thesis, we study the tradeoff between the amount and the accuracy of the NSI
available at the transmitters, and the resulting throughput performance of wireless
opportunistic scheduling. The first half of the thesis considers a multi-channel wireless
system, in which partial CSI is used to manage the control overheads of scheduling
policies. We investigate optimal channel probing policies to obtain CSI, and provide
a fundamental limit on the rate that CSI needs to be acquired to ensure a throughput
guarantee. The second half of this thesis studies the delays inherent in acquiring CSI
from across a network. We analyze the impact of these delays on system performance,
and compare the optimal centralized approach using delayed CSI with a distributed
approach using local CSI only. Lastly, we study the optimal location to place a
centralized controller in the network, as a function of the CSI delays at each node.
The remainder of this section elaborates on our contributions in these areas.
1.2.1
Channel Probing
Chapter 2 studies channel probing as a means of acquiring CSI. Channel probing
is widely used in modern wireless communication systems [59], in which a probing
signal is used to learn the current channel states, and this CSI is used to schedule
transmissions. However, using channel probing to maintain CSI pertaining to every
channel is impractical, and not necessary for efficient communication. Therefore, the
transmitter must decide which channels should be probed, and how often to probe
these channels.
In this thesis, we study the optimal channel probing and transmission policies for
opportunistic communication. To begin, we fix the time interval between channel
probes. For a system with two channels, we show that the choice of which channel
to probe does not affect the performance of the scheduler, allowing for a closedform characterization of the expected throughput. When the two channels differ
24
statistically, we identify scenarios in which it is optimal to always probe one over
the other. For a system with infinitely-many channels, we use renewal theory to
characterize the expected throughput of several probing policies, and show that when
the transmitter makes independent probing and transmission decisions, the myopic
probing policy shown to be optimal in [2,71] is no longer optimal. We conjecture that
the policy that probes the channel with the second-best expected channel state is the
optimal policy in a general system, and prove its optimality in a system with three
channels.
We extend this model to allow for a dynamic optimization of probing intervals
based on the results of the previous channel probes. We formulate this problem as
a Markov decision process and introduce a state action frequency approach to solve
it, which results in an arbitrarily good linear program approximation to the optimal
probing intervals. For the case of an infinite channel system, we explicitly characterize
the optimal probing interval for several probing policies.
1.2.2
Fundamental Limit on CSI Overhead
One of the goals of this thesis is to characterize a fundamental bound on the rate
that CSI needs to be conveyed to the transmitter to ensure a high throughput. Inspired from the work of Gallager in [19], in Chapter 3 we present a novel information theory-based formulation to quantify this limit. In particular, we consider a
transmitter-receiver pair, connected through multiple time-varying channels. The receiver feeds CSI back to the transmitter, which schedules transmissions using this
information to obtain a high throughput. The problem of minimizing the amount
of information required by the transmitter such that it can effectively control the
network is formulated as a rate distortion problem.
In this work, we design a new distortion metric for opportunistic communication,
capturing the impact of CSI availability on throughput. We incorporate a causality
constraint to this rate distortion formulation to reflect the practical constraints of
a real-time communication system. We compute a closed-form lower bound for the
required rate at which CSI must be conveyed to the controller for a two-channel sys25
tem, where the channel is time-varying according to a Markov process. Additionally,
we propose a practical encoding algorithm to achieve the required throughput with
limited CSI overhead. This analysis leads to an interesting observation regarding the
gap inherent in the causal rate distortion lower bound; we characterize this gap and
discuss scenarios under which it vanishes.
1.2.3
Delayed Channel State Information
The second half of this thesis studies the CSI required to make scheduling decisions
in a wireless network. Due to the transmission and propagation delays over wireless
links, it takes time for each node to acquire CSI pertaining to the other links in
the network. As a consequence, a node has CSI that is delayed with respect to the
current state of the channel. In Chapter 4, we propose a new model for CSI delays
capturing the effect of distance on CSI accuracy, such that nodes have accurate CSI
pertaining to adjacent links, and progressively delayed CSI pertaining to distant links.
Thus, any centralized scheduling scheme is inherently restricted to using delayed CSI.
An alternative approach is a distributed scheme using current local CSI rather than
delayed global CSI; however, distributed approaches make locally optimal decisions
which are often globally suboptimal [46]. We illustrate the impact of delayed CSI
on the throughput performance of centralized scheduling, and prove that as these
delays become significant, there exists a distributed policy that outperforms the optimal centralized policy. We develop sufficient conditions under which there exist such
distributed policies and analytically characterize the throughput performance in tree
and clique networks. In addition, we propose a hybrid approach combining centralized and distributed scheduling to trade-off between using delayed CSI and making
suboptimal local decisions.
Since the performance of centralized scheduling depends on the delay of the CSI,
the location of the controller impacts the attainable throughput. Therefore, in Chapter 5 we formulate the problem of finding the optimal controller placement over a
network. For any fixed controller placement, the links near the controller see a higher
expected throughput than links far from the controller, due to the relationship be26
Figure 1-1: Example wireless network
tween distance and CSI delay. Consequently, relocating the controller can balance the
throughput in the network. We propose a dynamic controller placement framework,
in which the controller is repositioned using globally available information, such as
delayed QLI, as it is known that delays in QLI do not affect the throughput optimality
of the max-weight policy [41]. We characterize the throughput region under all such
policies using distance-based delayed NSI, and propose a throughput-optimal joint
controller placement and scheduling policy.
1.2.4
Throughput Optimal Scheduling with Hidden CSI
Lastly, we consider the scenario in which the controller has QLI, but no CSI available
with which to schedule transmissions. As previous work suggests [48], scheduling
requires QLI to balance the backlogs throughout the network. However, when the
channel state process has memory, using current QLI is insufficient to optimally control the network. While it is known that delays in QLI do not negatively impact the
throughput, we prove that delays in QLI are necessary for throughput optimality. We
propose a scheduling policy using delayed QLI and prove its throughput optimality
for a wireless downlink. This represents a paradox in which delayed NSI is more
useful than current NSI.
27
p
1−p
0
1
1−q
q
Figure 1-2: Markov Chain describing the channel state evolution of each independent channel. State 0 corresponds to an OFF channel, while state 1 corresponds to an ON channel.
1.3
Modeling Assumptions
In this work, we model a wireless network as a collection of nodes and links, representing wireless transceivers connected through time-varying channels. A scheduling
decision corresponds to a link activation, or a subset of the links over which transmissions occur. To combat interference, we constrain a feasible link activation to be such
that no two adjacent links are activated. This is referred to as a primary interference
model or the node-exclusive interference model [29,47,61], and reflects the constraint
that neighboring transmissions cannot occur simultaneously.
The success of a wireless transmission depends on the strength of the signal received at the destination. Due to the fading characteristics of the channel, the received
signal power fluctuates [65], affecting the ability to decode a transmission. A typical
simplifying assumption is if the packet is received with a power (SNR) above a threshold, the packet is correctly decoded, and otherwise the packet is lost [72]. From this
assumption, we adopt a two-state channel model, in which the channel state is either
ON or OFF. When the channel is ON, a single packet can be transmitted, while any
transmission over an OFF channel fails.
We consider a time-slotted system in which the length of each time-slot is equal
to the time required to transmit a packet. We assume the channel state remains constant throughout a time slot. This is a typical assumption representing a slow-fading
environment, where the coherence time of the channel is larger than the duration of
a time slot.
We assume each channel has a state independent from every other channel in
28
the network, reflecting diversity in space [65]. However, channels evolve over time
according to a Markov chain, as shown in Figure 1-2. This Markov channel state
model was introduced by Gilbert in [22], and has been shown to accurately model
the time-varying nature of a Rayleigh fading channel [43, 67, 72]. The ON/OFF
Markov channel state model has been used in many previous works on throughputoptimal scheduling with partial channel state information [2,10,45,71]. The transition
probabilities p and q are related to the coherence time of the channel.
The main motivation behind the Markov model in Figure 1-2, aside from its
simplicity, is that it captures the memory in the channel state process. We assume
the channel model satisfies 1 − p − q ≥ 0, corresponding to channels with positive
memory. In other words, a channel that is in the ON state is more likely to remain
in the ON state than turn OFF, implying that knowledge of the channel state can be
used to estimate the channel state in future time slots. This reduces the amount of
overhead required, since the memory in the channel state process can be utilized for
scheduling.
1.4
Thesis Outline
The remainder of the thesis is organized as follows. Chapter 2 presents the channel
probing framework, and proposes optimal strategies of probing channels to obtain
CSI. Chapter 3 considers a fundamental lower bound on the rate of CSI acquisition
required by the transmitter using causal rate distortion theory. Chapter 4 studies
throughput optimal scheduling with delayed CSI, and shows that with enough delay,
distributed scheduling outperforms centralized scheduling. Chapter 5 studies the
problem of throughput-optimal controller placement, in terms of the delayed CSI
and QLI available to each node in the network. Lastly, Chapter 6 studies a wireless
downlink for which CSI is not available to the base station.
29
30
Chapter
2
Channel Probing in Opportunistic
Communication Systems
Consider a system in which a transmitter has access to multiple channels over which
to communicate. The state of each channel evolves independently from all other
channels, and the transmitter does not know the channel states a priori. The transmitter is allowed to probe a single channel after a predefined time interval to learn
the current state of that channel. Using the information obtained from the channel
probes and the memory in the channel state process, the transmitter selects a channel
in each time-slot over which to transmit, with the goal of maximizing throughput, or
the number of successful transmissions.
This framework applies broadly to many opportunistic communication systems,
in which there exists a tradeoff between overhead and performance. When there is
a large number of channels over which to transmit, or a large number of users to
transmit to, it may be impractical to learn the channel state information (CSI) of
every channel before scheduling a transmission; consequently, the transmitter may
be restricted to using partial channel state information, and use that partial CSI to
make a decision. The transmitter must decide how much information to obtain, and
which information is needed in order to make efficient scheduling decisions.
In the context of channel probing, the decision of what information to obtain
translates to the decision of which channel to probe. We refer to this decision as
31
the probing policy. Similarly, the decision of how much information to acquire translates to deciding how often to probe channels for CSI. This decision is referred to
throughout this work as the probing interval. We consider both the scenario in which
the probing interval is constant between channel probes, and the scenario where the
probing interval is allowed to vary based on the channel probing history.
In this work, we study channel probing for wireless opportunistic communication,
in which the transmitter is able to transmit over a channel other than that which was
probed1 . In a system with two channels, we show that the choice of which channel
to probe does not affect the expected throughput. Additionally, we identify scenarios
such that when the probability distribution of the channel state differs between the
two channels, it is optimal to always probe one of the channels. For a system with an
asymptotically large number of channels, we show that the myopic policy in [2, 71] is
no longer optimal. Specifically, we use renewal theory to prove that a simple policy,
namely the policy which probes the channel that is second-most likely to be ON,
has a higher per-slot expected throughput. We characterize the per-slot throughput
for these policies, and calculate the optimal fixed probing interval as a function of
a probing cost. Furthermore, we prove the optimality of this policy for a system of
three channels, and conjecture that this policy is in fact optimal for systems with any
number of channels. In the second half of the work, we extend our model to allow for
a dynamic optimization of the probing intervals based on the results of past channel
probes. We formulate the problem as a Markov decision process, and introduce a
state action frequency approach to solve for the optimal probing intervals. For the
case of an infinite system of channels, we explicitly characterize the optimal probing
interval for various probing policies.
The remainder of this chapter is organized as follows. We describe the model
and problem formulation in detail in Section 2.1. In Section 2.2, we analyze the
channel probing problem for a system with two channels. In Section 2.3, we find
the optimal probing policy for a system with three channels, and conjecture the
optimal policy in a general system. We extend this to an infinite channel system in
1
Preliminary versions of this work appeared in [37] and [36].
32
Section 2.4, and apply renewal theory to show that the myopic policy is suboptimal
by analytically computing the expected per-slot throughput of another policy, which
is proven outperform the myopic policy of [2]. In Section 2.5, we solve for the optimal
probing intervals when a fixed cost is associated with probing.
2.1
System Model
S1
S2
TX
RX
SM
Figure 2-1: System model: transmitter and receiver connected through M independent
channels
Consider a transmitter and a receiver that communicate using one of M independent channels, as shown in Figure 2-1. Assume time is slotted and at every time
slot, each channel is either in an OFF state or an ON state. Channels are i.i.d. with
respect to each other, and evolve across time according to a discrete time Markov
process described by Figure 2-2.
At each time slot, the transmitter chooses a single channel over which to transmit.
If that channel is ON, then the transmission is successful; otherwise, the transmission
fails. We assume the transmitter does not receive feedback regarding previous transmissions2 . The objective is to maximize the expected sum-rate throughput, equal to
the number of successful transmissions over time.
The transmitter obtains channel state information (CSI) by explicitly probing
channels at predetermined intervals. In particular, the transmitter probes the receiver
every k slots for the state of one of the channels at the current time. Assume this
information is delivered instantaneously, which is the same assumption made in many
2
If such feedback exists in the form of higher layer acknowledgements, it arrives after a significant
delay and is not useful for learning the channel state.
33
p
1−p
0
1
1−q
q
Figure 2-2: Markov Chain describing the channel state evolution of each independent channel. State 0 corresponds to an OFF channel, while state 1 corresponds to an ON channel.
previous works (e.g. [2, 23]). The transmitter uses the history of channel probes to
make a scheduling decision. We emphasize that the transmitter may use a channel
other than that which was probed for transmission. For example, if the transmitter
probes a channel and it is found to be OFF, the transmitter can use a different channel
which is more likely to be ON.
2.1.1
Notation
Let Si (t) be the state of channel i at time t, where Si (t) = 1 corresponds to a channel
that is ON at time t, and Si (t) = 0 corresponds to channel in the OFF state. The
transmitter has an estimate of this state based on previous probes and the channel
state distribution. Define the belief of a channel to be the probability that a channel
is ON given the history of channel probes. For any channel i that was last probed k
slots ago and was in state si , the belief xi is given by
xi (t) = P Channel i is ON|probing history
(2.1)
= P Si (t) = 1|Si (t − k) = si )
where the second equality follows from the Markov property of the channel state
process. The above probability is computed using the k-step transition probabilities
34
of the Markov chain in Figure 2-2:
p − p(1 − p − q)k
q + p(1 − p − q)k k
, p01 =
p+q
p+q
k
q − q(1 − p − q) k
p + q(1 − p − q)k
=
, p11 =
.
p+q
p+q
pk00 =
pk10
(2.2)
Throughout this work, we assume that 1 − p − q ≥ 0, corresponding to channels with
“positive memory.” The positive memory property ensures that a channel that was
ON k slots ago is more likely to be ON at the current time, than a channel that was
OFF k slots ago. This allows the transmitter to make efficient scheduling decisions
without explicitly obtaining CSI at each time slot. Mathematically, this property is
described by the set of inequalities:
pi01 ≤ pj01 ≤ pk11 ≤ pl11
∀i ≤ j
∀l ≤ k.
(2.3)
As the CSI of a channel grows stale, the probability that the channel is in the ON
state approaches π, the stationary distribution of the chain in Figure 2-2.
lim pk01 = lim pk11 = π =
k→∞
k→∞
p
.
p+q
(2.4)
Lastly, let τ k (·) be the function representing the change in belief of a channel over
k time-slots when no new information regarding that channel is obtained.
τ k (xi ) = xi pk11 + (1 − xi )pk01
(2.5)
This function is used throughout this chapter to analyze the state transition properties
of the system.
2.1.2
Optimal Scheduling
Since the objective is to maximize the expected sum-rate throughput, the optimal
transmission decision at each time slot is given by the maximum likelihood (ML)
rule, which is to transmit over the channel that is most likely to be ON, i.e. the
35
channel with the highest belief. The expected throughput in a time slot is therefore
given by
max xi (t).
(2.6)
i
where xi (t) is the belief of channel i at time t. Following the linearity of the state
transition function τ k (xi ) in (2.5), and the positive memory assumption, the optimal
scheduling decision remains the same between channel probes, as no additional CSI
is obtained.
2.2
Two-Channel System
To begin, we consider a two-channel system, and formulate the optimal probing strategy using dynamic programming (DP) over a finite horizon of length N . Each index
n corresponds to a time slot at which a probing decision is made. Assume there are k
time slots between channel probes; thus, index n corresponds to time slot t = kn. The
system state at each probing index n is equal to the vector (x1 (n), x2 (n)), the belief
of channel 1 and channel 2 as defined in (2.1). Let f k (x1 , x2 ) be the accumulated
throughput over the k slots between channel probes, when channel 1 is probed. The
function f k (x1 , x2 ) is computed by conditioning on the state of channel 1. If channel 1
is ON, which occurs with probability x1 , then the transmitter uses that channel for k
P
i
slots, resulting in throughput k−1
i=0 p11 . If the probed channel is OFF, then the other
P
i
channel is used for transmission over those k slots, yielding throughput k−1
i=0 τ (x2 ).
Consequently, the expected accumulated throughput is given by
k
f (x1 , x2 ) = x1
k−1
X
pi11
+ (1 − x1 )
i=0
k−1
X
τ i (x2 )
(2.7)
i=0
Similarly, in terms of the above definition, f k (x2 , x1 ) is the accumulated throughput
over the k slots between channel probes when channel 2 is probed.
We proceed by developing the DP value function for each probing decision. Let
Jni be the expected reward after the nth probe if the choice is made to probe channel
i at the current probing instance, and then follow the optimal probing policy for all
36
subsequent probes. The expected reward after the last probe is given by:
J N x1 , x2 = max
JN1
x1 , x2 , JN2 x1 , x2
J 1N x1 , x2 = f k (x1 , x2 )
J 2N x1 , x2 = f k (x2 , x1 )
(2.8)
(2.9)
(2.10)
Equations (2.9) and (2.10) follow since N is the final channel probe (in a time horizon
of length N ), and thus the only reward is the immediate reward, which is given by
(2.7). At probing time 0 ≤ n < N , the expected reward function is defined recursively.
If the decision at probe n is to probe channel 1, then an expected throughput of
f k (x1 , x2 ) is accumulated between probes n and n + 1, and at probe n + 1, the belief
of channel 1 will be pk11 (pk01 ) if the probed channel was ON (OFF), and the belief of
channel 2, which was not probed, will be τ k (x2 ). Thus, Jn (x1 , x2 ) is defined recursively
as:
Jn x1 , x2 = max
Jn1
x1 , x2 , Jn2 x1 , x2
Jn1 x1 , x2 = f k (x1 , x2 ) + x1 Jn+1 (pk11 , τ k (x2 )) + (1 − x1 )Jn+1 (pk01 , τ k (x2 ))
Jn2 x1 , x2 = f k (x2 , x1 ) + x2 Jn+1 (τ k (x1 ), pk11 ) + (1 − x2 )Jn+1 (τ k (x1 ), pk01 )
(2.11)
(2.12)
(2.13)
The dynamic program in (2.8)-(2.13) can be solved to compute the optimal probing
policy for the two channel system. To begin with, we prove the following property of
the immediate reward after probing, f k (x1 , x2 ).
Lemma 1. f k (x1 , x2 ) = f k (x2 , x1 )
The proof of Lemma 1 is given in the Appendix. Lemma 1 states that the immediate reward for probing channel 1 is the same as that for probing channel 2, for
all probing intervals k. This is a consequence of the ability of the transmitter to
choose over which channel to transmit after a channel probe, and accounts for the
key difference between the model considered in this chapter, and models considered
in previous works [2, 71]. Using this result, we present the main result of this section.
37
Theorem 1. For a two-user system with independent channels evolving over time
according to an ON/OFF Markov chain with transition probabilities p and q, and
probing epochs fixed at intervals of k slots, then for each channel probe, the total
reward from probing channel 1 is equal to that of probing channel 2.
Corollary 1. The channel probing policy which always probes channel 1 (2) is optimal
in a two-channel system.
The proof of Theorem 1 is given in the Appendix, and follows using induction
based on Lemma 1, and the affinity of the expected reward function in (2.8)-(2.13).
Corollary 1 follows directly from Theorem 1. Intuitively, when a channel is probed,
the transmitter receives information about the optimal channel to use until the next
probe. For example, if the probed channel is ON, it is optimal to transmit over that
channel until the next probe occurs. On the other hand, if the probed channel is
OFF, it is optimal to transmit over the un-probed channel, because the belief of that
channel will always be higher than that of the OFF channel, based on the inequalities
in (2.3). Thus, the only information required from the channel probe is which channel
to transmit over until the subsequent channel probe, and this information can be
obtained through probing either channel.
This result is in contrast to the result in [71], which proves that the optimal
decision is to probe the channel with the highest belief. However, their model assumes
that a transmission must occur over the probed channel, whereas our model allows
the transmitter to choose the channel over which to transmit independently based
on the result of the probe. Consequently, the myopic policy of [71] is not a uniquely
optimal policy in this setting.
Theorem 1 is used to determine the optimal fixed probing interval. Clearly, probing more frequently yields higher throughput, but requires more resources as well. To
capture this, we associate a fixed cost c with each probe. The goal is to determine
the probing interval k that maximizes the difference between throughput earned and
cost accumulated.
Theorem 2. Assume a fixed-interval probing scheme with probing cost c. The optimal
38
Optimal Fixed Probing Intervals for Probing Cost = 0.5
55
Optimal Fixed Probing Interval
50
45
40
35
30
25
20
15
10
5
0
0.05
0.1
0.15
State Transition Probability (p)
0.2
0.25
Figure 2-3: Optimal fixed probing interval for a two channel system as a function of state
transition probability p = q. In this example, c = 0.5.
probing interval is given by
k ∗ = arg max
k
πpk10 − c(p + q)
.
k(p + q)
(2.14)
Proof. From Corollary 1, the optimal probing policy is that which always probes
channel 1. Under this policy, the belief of channel 2 equals the steady state probability
of being in the ON state (π) given in (2.4). Channel 1 is probed every time, and will
P
k
be ON a fraction π of the time. When channel 1 is ON, a throughput of k−1
i=0 p11 is
obtained, and when it is OFF, the throughput is simply πk, the expected throughput
yielded by channel 2 over an interval of duration k. Consequently, the expected
per-slot throughput accounting for the cost of probing is given by
1
k
−c+π
k−1
X
pi11
+ (1 − π)πk
i=0
=
−c
πpk10
+π+
.
k
k(p + q)
(2.15)
The proof follows by maximizing the above expression with respect to k.
Figure 2-3 shows the optimal probing interval as a function of the state transition
probability p. As p increases, each probe gives less information for the same cost.
39
Throughput of Optimal Fixed−interval Policy for Two−channel System, c = 0.5
0.75
0.7
Throughput
0.65
0.6
0.55
0.5
0.45
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
State Transition Probability (p)
0.4
0.45
0.5
Figure 2-4: Throughput under the optimal fixed-interval probing policy for a two-channel
system as a function of the state transition probability p = q . In this example, c = 0.5.
Thus, as the transition probability starts to increase, the optimal probing interval
decreases, since information needs to be obtained more frequently to account for the
reduced information in each probe. As p continues to grow, the reward from probing
becomes smaller than the cost to probe, and it becomes optimal to not probe.
Figure 2-4 shows the throughput under the optimal probing interval from Theorem
2 for various transition probabilities. As the state transition probability increases,
throughput decreases. Note the optimal throughput does not drop below the steady
state probability π, because at that point it is optimal not to probe, due to the high
probing cost, and guess which channel to use.
Theorems 1 and 2 combine to characterize the optimal fixed-interval probingpolicy for a two channel system. However, when the two channels are not identically
distributed, the optimal probing decision depends on the channel statistics, as shown
in Section 2.2.1. Furthermore, if the probing epochs are not fixed, the decision to
probe depends on the results of the previous probe, yielding an advantage to probing
one channel over the other, as shown in Section 2.5.
40
p1
1 − p1
0
p2
1
1 − q1
1 − p2
0
q1
1
1 − q2
q2
(a) Channel 1
(b) Channel 2
Figure 2-5: Two asymmetric Markov Chains, where 1 − p1 − q1 ≥ 0, and 1 − p2 − q2 ≥ 0.
2.2.1
Heterogeneous Channels
In this section, we extend the results of the previous section to the case where the two
channels differ statistically, i.e. channel 1 evolves in time according to the Markov
chain in Figure 2-5a, and channel 2 evolves according to the chain in Figure 2-5b.
Denote the k-step transition probability of channel 1 as aki,j and the k-step transition
probability of channel 2 as bki,j .
p1 + q1 (1 − p1 − q1 )k
p1 + q1
p2 + q2 (1 − p2 − q2 )k
=
p2 + q2
p1 − p1 (1 − p1 − q1 )k
p1 + q1
p2 − p2 (1 − p2 − q2 )k
=
p2 + q2
ak11 =
ak01 =
(2.16)
bk11
bk01
(2.17)
Additionally, let π1 and π2 be the steady state ON probability of channel 1 and
channel 2 respectively.
Intuitively, it is optimal to probe the channel with more memory, as that probe
yields more information. For example, consider a channel that varies rapidly, with
p1 = q1 =
1
2
− , and a channel which rarely changes state, with p2 = q2 = .
Probing the low-memory channel provides accurate information for a few time slots,
but that information quickly becomes stale, and the transmitter effectively guesses
which channel is ON until the next probe. On the other hand, probing the highmemory channel yields information that remains accurate for many time slots after
the probe. This intuition is confirmed in the following result.
Theorem 3. For a two-user system with channel states evolving as in Figure 2-5,
41
and probing instances fixed to intervals of k slots, if p1 , p2 , q1 , q2 satisfy
bi11 ≥ ai11
∀i,
(2.18)
then, the optimal probing policy is to probe channel 2 at all probing instances.
The proof of Theorem 3 is given in the Appendix, and follows by reverse induction
over the channel probing instances. To highlight its significance, we present the
following corollaries.
Corollary 2. Assume the two channels satisfy π1 = π2 , and that p1 + q1 ≥ p2 + q2 .
Then, the optimal policy is to always probe channel 2.
Proof. We can rewrite the k-step transition probability of the second chain from (2.2)
as follows.
bi11 =
p2 + q2 (1 − p2 − q2 )i
p2 + q2
= π2 + (1 − π2 )(1 − p2 − q2 )i
(2.19)
= π1 + (1 − π1 )(1 − p2 − q2 )i
(2.20)
≥ π1 + (1 − π1 )(1 − p1 − q1 )i
(2.21)
= ai11
(2.22)
where (2.20) follows from the assumption that π1 = π2 , and (2.21) follows from the
assumption that p1 + q1 ≥ p2 + q2 . Therefore, bi11 ≥ ai11 , and applying Theorem 3
concludes the proof.
Corollary 3. Assume the two channels satisfy p1 + q1 = p2 + q2 , and that π1 ≤ π2 .
Then, the optimal policy is to always probe channel 2.
Proof. We can rewrite the k-step transition probability of the second chain from (2.2)
as follows.
bi10 =
q2 (1 − (1 − p2 − q2 )i )
p2 + q2
42
= (1 − π2 )(1 − (1 − p2 − q2 )i )
(2.23)
= (1 − π2 )(1 − (1 − p1 − q1 )i )
(2.24)
≤ (1 − π1 )(1 − (1 − p1 − q1 )i )
(2.25)
= ai10
(2.26)
where (2.24) follows from the assumption that p1 + q1 = p2 + q2 , and the inequality
in follows from the assumption that π1 ≤ π2 . Since bi10 ≤ ai10 , then bi11 ≥ ai11 , and
Theorem 3 can be applied to complete the proof.
The above two corollaries describe scenarios where asymmetries in the channel
statistics result in the optimal policy of always probing one of the two channels.
This is in contrast to Theorem 1 where the channels are homogeneous, and probing
either channel yields the same throughput. Corollary 2 states that if the channels are
equally likely to be ON in steady state, the optimal decision is to probe the channel
with the smaller pi + qi . In this context, pi + qi is the rate at which the channel
approaches the steady state. In particular, the Markov channel state approaches
its stationary distribution exponentially at a rate equal to the second eigenvalue of
the transition probability matrix, which for a two-state chain is 1 − p − q [21]. The
channel which approaches steady state more slowly is the channel with more memory,
thus confirming our intuition that probing the channel with more memory is always
optimal. Corollary 3 applies to a system in which the rate at which the steady state
is reached is the same for both channels, but channel 2 is more likely to be ON in
steady state than channel 1. In this case, it is optimal to probe the channel with the
highest steady state probability of being ON at all probing instances.
2.2.2
Simulation Results
We simulate the evolution of a two-channel system over time, and compare different
fixed probing policies in terms of average throughput. We assume a time horizon
of 2,000,000 probes, and assume a probe occurs every 6 slots. We consider five
deterministic stationary channel probing policies: probe channel 1 always, probe
43
Simulation
p1 = q1 = 0.1
p2 = q2 = 0.1
0.6536
0.6540
0.6538
0.6538
0.6532
Probe Channel 1
Probe Channel 2
Probe Best Channel
Probe Worst Channel
Round Robin
p1 = 0.3, q1 = 0.1
p1 = q1 = 0.1
p2 = 0.15, q2 = 0.05 p2 = 0.15.q2 = 0.05
0.8240
0.7899
0.8652
0.8027
0.8450
0.8030
0.8402
0.7902
0.8452
0.7981
Table 2.1: Comparison of different probing policies for a two-channel system for a fixed
probing interval (6) and time horizon 2,000,000.
channel 2 always, probe the channel with the higher belief, probe the channel with
the lower belief, and alternate between the channels (round robin). The average
throughput under each of these policies is shown in Table 2.1. The first column of
Table 2.1 shows that for a system with two i.i.d. channels with parameters p = q =
0.1, the choice of channel probing policy does not affect the average reward earned
by the system, as predicted by Theorem 1.
Additionally, we simulate a system with two statistically different channels. These
results are shown in the second and third columns of Table 2.1. The first simulation
(column 2) uses two channels with the same steady state probability (π = 0.75), but
with channel 1 approaching steady state at a faster rate than channel 2. By Corollary
2, the optimal probing policy is to always probe channel 2, which is consistent with the
simulation. The second simulation (column 3) uses two channels satisfying p1 + q1 =
p2 + q2 = 0.2, and π2 > π1 , as in Corollary 3. As expected, probing channel two is
optimal (after accounting for noise in the simulation measurements). In this case,
probing the channel with the higher belief is a good policy, since the channel with
the higher steady state probability has a higher belief more often.
Figure 2-6 plots the throughput obtained by the policy which always probes channel 1 versus the policy that always probes channel 2 for a sample set of parameters.
For the second channel, p2 =
1
4
and q2 =
1
,
12
so that π2 = 34 . For channel 1, π1 is fixed
at 43 , but p1 is varied from 0 to 12 . When channel 1 has less memory than channel
2, probing channel 1 yields much higher throughput than the alternative. In this
example, when p1 is very small and channel 1 has a high degree of memory, probing
channel 1 results in a 15% throughput improvement over probing channel 2.
44
0.96
Probe Channel 1
Probe Channel 2
0.94
0.92
Throughput
0.9
0.88
0.86
0.84
0.82
0.8
0.78
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Channel Transition Probability p1
Figure 2-6: Throughput of ’Probe Channel 1’ policy and ’Probe Channel 2’ policy. In this
example, p1 is varied from 0 to 21 , and q1 is chosen so π = 43 . The second channel satisfies
1
p2 = 14 and q2 = 12
, resulting in π2 = π1
Theorem 1 and Theorem 3 describe scenarios in which probing one of the two
channels at all probing instances is optimal. The simplicity of the optimal probing
policy in these cases is an artifact of the transmitter only having two-channels from
which to choose. As the number of channels increases, a policy always probing one
of the channels is suboptimal. Therefore, additional analysis is required for a system
with more than two channels.
2.3
Optimal Channel Probing over Finitely Many
Channels
As mentioned above, for systems with more channels, i.e. M > 2, the policy of
always probing one of the channels is suboptimal. In particular, the optimal probing
policy is a function of the beliefs of the channels. In this section, we show that the
policy which probes the channel with the second highest belief is optimal for a system
of three channels, and conjecture an extension to a general system of finitely many
channels.
45
2.3.1
Three Channel System
To begin, consider a system of three channels, with channel states identically distributed according to the Markov chain in Figure 2-2. The following result characterizes the optimal channel probing policy as a function of the beliefs of the three
channels.
Theorem 4. In a system of three channels, where a single channel is probed every k
slots, the optimal probing policy is to probe the channel with the second-highest belief.
Denote by xi the belief of the channel with the ith largest belief. Thus, x1 ≥ x2 ≥
x3 . The probe second-best policy probes the channel with belief x2 . If that channel
is ON, the transmitter uses that channel to transmit over for the next k slots. After
these k slots, the best channel is the channel that was last probed, with belief τ k (1),
where τ k is the information-decay function defined in (2.5). If on the other hand, the
probed channel is OFF, the transmitter will use the channel with the highest belief
among the remaining channels, x1 . After k slots, that channel will have belief τ k (x1 ),
and the belief of the probed channel will be the smallest, at τ k (0).
Define a function Wn as follows:
Wn (x1 , x2 , x3 ) , f k (x1 , x2 ) + x2 Wn+1 τ k (1), τ k (x1 ), τ k (x3 )
+ (1 − x2 )Wn+1
k
k
(2.27)
k
τ (x1 ), τ (x3 ), τ (0)
for all 0 ≤ n ≤ N , where f k (·) is the immediate reward function defined in (2.7).
Let WN +1 (x1 , x2 , x3 ) = 0 by convention. Note that Wn (x1 , x2 , x3 ) is the expected
throughput of the probe second-best policy from time n onwards if and only if x1 ≥
x2 ≥ x3 . Additionally, if x2 ≥ x1 ≥ x3 , then Wn (x1 , x2 , x3 ) is the expected reward of
the policy which probes the channel with the highest belief at index n, and then probes
the channel with the second highest belief at all subsequent times. The following
results hold for this definition of Wn , and are used to prove Theorem 4.
46
Lemma 2. If x1 ≥ x2 ≥ x3 , then for all 0 ≤ n ≤ N ,
Wn (x1 , x2 , x3 ) ≥ Wn (x2 , x1 , x3 )
(2.28)
Lemma 3. If x1 ≥ x2 ≥ x3 , then for all 0 ≤ n ≤ N ,
Wn (x1 , x2 , x3 ) ≥ Wn (x1 , x3 , x2 )
(2.29)
The proofs of Lemmas 2 and 3 are given in the Appendix.
Proof of Theorem 4. Without loss of generality, assume the beliefs of the three channels x1 , x2 , x3 satisfy x1 ≥ x2 ≥ x3 . The proof follows using reverse induction
on the probing index n. For n = N , probing the best channel yields throughput
WN (x2 , x1 , x3 ), while probing the second and third best channels yields throughput WN (x1 , x2 , x3 ) and WN (x1 , x3 , x2 ) respectively. By Lemma 2, WN (x1 , x2 , x3 ) ≥
WN (x2 , x1 , x3 ), and by Lemma 3, WN (x1 , x2 , x3 ) ≥ WN (x1 , x3 , x2 ); therefore, probing
the second-best channel is optimal at n = N .
Now assume it is optimal to probe the second-best channel at probes n + 1, . . . , N .
At probing instance n, the throughput of the three potential choices of channels
are given by Wn (x2 , x1 , x3 ), Wn (x1 , x2 , x3 ), and Wn (x1 , x3 , x2 ) for probing the best,
second-best, and third best channels respectively. By Lemma 2, Wn (x1 , x2 , x3 ) ≥
Wn (x2 , x1 , x3 ), and by Lemma 3, Wn (x1 , x2 , x3 ) ≥ Wn (x1 , x3 , x2 ); therefore, probing
the second-best channel is optimal at n as well. By induction, probing the second-best
channel is optimal at all probing times.
This result is exciting as it contradicts the previous result in [2] which stated
that the policy which probes the best channel is optimal for the model in which the
transmitter must use the channel that was probed for transmission. In our model, the
transmitter can collect CSI separately from the transmission decision, and therefore
probing the second-best channel yields a higher throughput. Further intuition as to
why the probe second-best policy is optimal is presented in Section 2.4.2.
47
2.3.2
Arbitrary Number of Channels
Theorem 4 shows that the probe second-best policy is optimal for a system of three
channels. In general, for M > 3, we conjecture that the probe second-best policy
remains optimal.
Conjecture 1. The probe second-best policy is optimal among all channel probing
policies for fixed probing intervals k.
The proof used for the M = 3 channel case does not extend to M ≥ 4. In [2], the
authors used a coupling argument to circumvent this issue and prove the optimality
of the myopic policy for their setting for general networks. However, due to the
additional complexity of the probe second-best policy, this coupling argument does
not hold in our setting. Instead, we believe the general case can be proven by bounding
the maximum difference in expected reward from being in a better state after probing
the k th best channel for k ≥ 2, and proving that this extra reward must be less than
the gain in the immediate expected reward that probing the second-best channel
offers.
2.3.3
Simulation Results
For a system with more than two channels, we can compare various probing policies
to show support for Conjecture 1, shown in Table 2.2. In addition to the policies
considered in the previous section, we include the optimal probe second-best policy,
and a policy probing the third-best channel for comparison. In all scenarios, the
probe second-best policy outperforms the other probing policies, thus supporting our
conjecture. However, the advantage of using the probe second-best policy over similar
policies, such as probe best and probe third best, is relatively small.
In Figure 2-7, we compare the performance of the probe-best policy, the probe
second-best policy, and probe third-best policy as a function of the number of channels
in the system, for a fixed probing interval. We see that as the number of channels
grows, the gap in performance between the probe second-best policy and the probe
second-best policy increases. Furthermore, the probe third best policy becomes more
48
Simulation
Probe Channel 1
Probe Best Channel
Probe Second-Best
Probe Third Best
Probe Worst
Round Robin
3 Channels
0.6955
0.7455
0.7553
0.6849
0.6860
0.7460
5 Channels
0.6959
0.7640
0.7787
0.7617
0.6804
0.7649
7 Channels
0.6957
0.7650
0.7799
0.7691
0.6810
0.7658
10 Channels
0.6958
0.7659
0.7808
0.7706
0.6806
0.7661
Table 2.2: Comparison of different probing policies for a fixed probing interval (6) and time
horizon 2,000,000. State transition probability p = q = 0.05
Policy Comparison over 2M probes, k = 4
1
0.95
Probe Best
Probe 2nd Best
Probe 3rd Best
Throughput
0.9
0.85
0.8
0.75
0.7
2
4
6
8
10
Number of Channels
12
14
16
Figure 2-7: Comparison of the probe best policy, the probe second-best policy, and the probe
third best policy as a function of the number of channels in the system. This simulation
was run over 2 million probes, with each probe being at an interval of 4 time slots.
efficient as the number of channels increase, but does not reach the level of throughput
of the probe second-best policy.
2.4
Infinite-Channel System
As the number of channels increases, the state space grows large and the probing
formulation becomes more difficult to analyze. However, as the number of channels
grows to infinity, the state space of the system can be simplified. For an infinite
channel system, whenever a probed channel is OFF, it is effectively removed from the
49
system. This is because there always exists a channel which has not been probed in
the previous N slots, for any finite N , and thus its belief is equal to the steady state
ON probability π. Therefore, since an OFF channel has belief pk01 ≤ π for any finite
k, it will never be optimal to transmit over that channel.
In this section, we use the infinite channel assumption to characterize the average
throughput under several probing policies. We consider the myopic policy which is
shown to be optimal for the model in [2, 71], as well as a round robin policy which
probes channels sequentially. In addition, we characterize the throughput of the
probe second-best policy, which is conjectured to be the optimal probing policy for a
finite number of channels in Section 2.3, and prove that it outperforms the other two
policies in this setting.
2.4.1
Probe-Best Policy
To begin, consider the probe-best policy, which probes the channel with the highest
belief. This policy is commonly referred to as a myopic or greedy policy, as it maximizes the immediate reward without regard to future rewards. Intuitively, such a
policy is advantageous as the channel with the highest belief is the most likely to be
ON at the current time, yielding the highest expected throughput. Recall that this
policy is shown to be optimal for the model in [2, 71]. For our model, we have the
following results.
Theorem 5. The state of the system is given by an infinite vector of beliefs for each
channel. Without loss of generality, assume this vector is sorted as x = {x1 , x2 , . . .}
such that x1 ≥ x2 ≥ x3 . . .. The class of recurrent states under the probe-best policy
satisfy x1 ≥ π, and xi = π for all other channels i 6= 1.
Proof. The probe best policy probes the channel with belief x1 . If this channel is ON,
its belief becomes p111 in the next slot, and it remains the channel with the highest
belief by the equality in (2.3). If that channel is OFF, it is removed from the system
as per the infinite channel assumption. Therefore, the vector consisting of xi = π for
all i is reachable from any state. This state corresponds to the transmitter having
50
no information about the network. The only other state reachable from this state is
reached when an ON channel is found, at which point, the state returns to a state
satisfying x1 ≥ π, and xi = π
∀i 6= 1.
Theorem 6. Assume the transmitter makes probing decisions every k slots according
to the probe best policy. The expected per-slot throughput is given by
E[Thpt] = π +
πpk10
k(p + q)(pk10 t + π)
(2.30)
Proof. We use renewal theory to compute the average throughput. Under the probe
best policy, Theorem 5 states that only one channel can have belief greater than
π. Define a renewal to occur immediately prior to probing a channel with belief π.
Therefore, if a channel is probed and if it is OFF, it is removed from the system and
a renewal occurs k slots later (before the next probe). If the channel is ON, that
channel is probed at all future probing instances until it is found to be OFF. The
expected inter-renewal time X B is given by
X B = (1 − π)k + π(kE(N ) + k)
= k + kπE(N )
(2.31)
(2.32)
where N is a random variable denoting the number of times an ON channel is probed
before it is OFF, and is geometrically distributed with parameter pk10 . Equation (2.32)
reduces to
XB = k +
πk
.
pk10
(2.33)
The expected reward RB incurred over a renewal interval is πk for the interval imP
i
mediately after the OFF probe, and k−1
i=0 p11 for each subsequent ON probe. If the
first probe is ON, then there will be N probes until the final OFF probe. Thus, the
expected accumulated reward over a renewal interval is expressed as
RB = (1 − π)πk + π(πk + E[N ]
k
X
i=1
51
pi11 )
(2.34)
k−1
X
π
= πk + πE N
pi11 = πk +
i=0
Pk−1
i=0
pk10
pi11
.
(2.35)
Using results from renewal-reward theory [20], the average per-slot reward is given
by the ratio of the expected reward over the renewal interval divided by the expected
length of that interval.
P
i
πkpk10 + π k−1
πpk10
RB
i=0 p11
=
π
+
=
kpk10 + πk
k(p + q)(pk10 + π)
XB
(2.36)
Observe that the per-slot throughput is always larger than π, and decreases toward
π as k increases. The probe best policy maximizes the immediate reward; however,
the drawback of this policy is that when the probed channel is OFF, the transmitter
has no knowledge of the state of the other channels as it searches for an ON channel,
as described by Theorem 5. Consequently, transmitter probes channels with belief π
until an ON channel is found, resulting in a low expected reward.
2.4.2
Probe Second-Best Policy
Now, consider a simple alternative policy, the probe second-best policy, which at
each time slot probes the channel with the second-highest belief, and transmits on
the channel with the highest belief after the channel probe. Consider channel state
beliefs x1 , x2 , x3 , . . . where x1 ≥ x2 . . . ≥ xi . . . ≥ π. The probe-best policy of the
Section 2.4.1 probes the channel with belief x1 . If it is ON, the transmitter uses
that channel (resulting in throughput equal to 1 for the next slot) and if it is OFF,
the transmitter uses the channel with the next highest belief x2 . Thus, the expected
immediate reward of probing the best channel is given by
x1 + (1 − x1 )x2 = x1 + x2 − x1 x2 ,
52
(2.37)
The probe second-best policy instead probes the channel with belief equal to x2 . If
this channel is ON, it transmits over that channel (resulting in throughput equal to
1) and otherwise transmits over the channel with highest belief, x1 . The expected
immediate reward of probing the second-best channel is equal to
x2 + (1 − x2 )x1 = x1 + x2 − x1 x2 .
(2.38)
Hence, the probe second-best policy has the same immediate reward as the probe
best policy. To understand how the probe second-best policy outperforms the probebest policy, consider the following result, analogous to Theorem 5 for the probe best
policy.
Theorem 7. The state of the system is given by an infinite vector of beliefs for each
channel. Without loss of generality, assume this vector is sorted as x = {x1 , x2 , . . .}
such that x1 ≥ x2 ≥ x3 . . .. The class of recurrent states under the probe second-best
policy satisfy x1 ≥ x2 ≥ π, and xi = π for all other channels i 6= 1, 2.
Proof. The probe second-best policy probes the channel with belief x2 . If this channel
is ON, its belief becomes pk11 at the next probe, and it becomes the channel with
the highest belief, while x1 becomes the second highest belief. If the channel is
OFF instead, it is removed from the system as per the infinite channel assumption.
Therefore, the vector consisting of x1 ≥ π and xi = π for all i > 1 is reachable from
any state. This state corresponds to the transmitter having information of only one
channel. From this state, by probing an ON channel, the system transitions into a
state with two channels having belief greater than π; however, the system can never
have more than two channels with xi > π.
By Theorem 7, since two channels can have belief greater than π under the probe
second-best policy, when the probe second-best policy probes an OFF channel, the
transmitter uses the channel with the next highest belief, while probing new channels
to find another ON channel. This approach results in a higher expected throughput
over that interval than under the probe best policy, which transmits on a channel
53
with belief equal to the steady state probability π. It is this intuition that leads us to
consider the probe second-best policy. The following theorem confirms our intuition,
by showing that the probe second-best policy yields a higher throughput than the
probe best policy.
Phase 1
1
1
0
0
0
0
Phase 2
1
1
0
Phase 1
0
0
Phase 2
1
1
1
1
0
0
Renewal Interval i + 1
Renewal Interval i
Figure 2-8: Illustration of renewal process. Points represent probing instances, and labels
represent probing results. Each renewal interval consists of phase 1, and phase 2.
Theorem 8. The average reward of the probe second-best policy is greater than that
of the probe best policy, for all fixed probing intervals k.
Proof. Theorem 8 is proved using renewal theory to compute the average throughput
of the probe second-best policy, and comparing it to that of the probe best policy.
The key to the proof is in the definition of the renewal interval. We define a renewal
to occur when the best channel has belief p2k
11 , and the second-best channel (and every
other channel) has belief π. A renewal interval is divided into two phases: Phase 1
includes all the channel probes until a probe results in an ON channel, and phase
2 includes the subsequent probes until an OFF channel is probed. The division of
renewal intervals into phases is illustrated in Figure 2-8. In Phase 1, the transmitter
probes channels with belief π until an ON channel is probed, and in phase 2, the
transmitter probes the second-best channel with belief greater than π until an OFF
channel is probed. This definition ensures that the inter-renewal periods are i.i.d.
The state evolution during an sample renewal interval is shown in Table 2.3.
The expected inter-renewal time is given by kE(N1 + N2 ), where N1 is the number
of probes required to find an ON channel in phase 1, and is geometrically distributed
with parameter π, and N2 is the number of probes required until the next OFF
probe in phase 2. The distribution of N2 is dependent on N1 , and has the following
54
Time
Best Channel Belief
Second-Best Belief
0
p2k
11
π
k
p3k
11
π
2k
p4k
11
π
3k
pk11
p5k
11
4k
pk11
p2k
11
5k
pk11
p2k
11
6k
p2k
11
π
Probe Result
0
0
1
1
1
0
-
Table 2.3: Example renewal interval starting at time 0 and renewing at time 6k. At each
probing interval, the second-best channel is probed.
distribution function.
N2 =


1 +2)k
1 w.p. p(N
10
(2.39)

i w.p. p(N1 +2)k p2k (p2k )i−2
11
10 11
i≥2
Therefore,
(2+N1 )k
X SB
1
E[p11
= kE(N1 + N2 ) = k
+1+
π
p2k
10
]
(2.40)
During phase 1 of a renewal, the expected reward accumulated is given by
1
RSB
(N1X
−1)k−1
k−1
X
i+2k
i
=E
p11 +
p11 .
i=0
(2.41)
i=0
The first term is the throughput obtained from transmitting over the best channel
while looking for an ON channel, which starts with belief p2k
11 and decays until an ON
channel is found, as shown in Table 2.3. In phase 2, the expected reward is given by
2
RSB
k−1
k−1
X
X
i
k+i
= E (N2 − 1)
p11 +
p11 .
i=0
(2.42)
i=0
For N2 − 1 intervals of length k, the transmitter will transmit over a channel that was
P
ON, yielding throughput ki=0 pi11 . Then, for the last interval prior to the renewal,
the best channel has belief pk11 , and the expected accumulated throughput over that
P
interval is ki=0 pk+i
11 . The average reward per time slot is given by
1
2
RSB + RSB
πpk10 (π + p2k
10 )
=π+
2k
2
(p + q)k[π + p10 (1 − (1 − p − q)k + π)]
X̄SB
55
(2.43)
Figure 2-9: Comparison of the probe best policy and the probe second-best policy for
varying probing intervals k. In this example, p = q = 0.05.
We can compute the difference between (2.43) and (2.30) from Theorem 6 as
2
1
+ R̄SB
R̄SB
R̄B
((1 − p − q)k πpk10 )2
−
=
k
k(p + q)(π + pk10 )(π 2 + p2k
X̄SB
X̄B
10 (π + 1 − (1 − p − q) ))
(2.44)
Due to the positive memory assumption, we have 0 ≤ (1 − p − q)k ≤ 1 for all k.
Therefore, the expression in (2.44) is positive, completing the proof.
Theorem 8 asserts that probing the channel with the second highest belief is a
better policy than probing the channel with the highest belief under fixed-interval
probing policies. A numerical comparison between these two policies is shown in
Figure 2-9. This result is in sharp contrast to the result in [2] that shows that
probing the channel with the highest belief is optimal. In our model, when a probed
channel is OFF, the transmitter uses its knowledge of the system to transmit over
another channel believed to be ON. In the model of [2], when an OFF channel is
probed, the transmitter cannot schedule a packet in that slot. This difference in the
reward after probing leads to significantly different probing policies. This result also
supports Conjecture 1, claiming that the probe second-best policy is optimal among
all policies.
56
Figure 2-10: Comparison of the probe best policy and the probe second-best policy for
varying state transition probabilities p = q. In this example, k = 1.
2.4.3
Round Robin Policy
It is of additional interest to consider a min-max policy, the round robin policy, which
probes the channel for which the transmitter has the least knowledge. In a system with
finitely many channels, the round robin policy probes all of the channels sequentially,
always probing the channel which was probed longest ago. When the number of
channels grows to infinity, the transmitter always probes a channel that has previously
never been probed. Consider channel state beliefs x1 , x2 , x3 , . . . where x1 ≥ x2 . . . ≥
xi . . . ≥ π. Under the round robin policy, a channel with belief π is probed; if that
channel is ON it will be used by the transmitter (earning throughput 1) and otherwise
the channel with the highest belief will be used (earning throughput x1 , the belief of
the best channel). Thus, the immediate reward of round robin is given by:
π + (1 − π)x1 = π + x1 − πx1 .
(2.45)
By comparing (2.45) to (2.37), it is clear the immediate reward of the round robin
policy is less than that of the probe best and the probe second-best policy. Interestingly, the following Theorem shows that the average per-slot throughput is the same
57
for the round robin policy as the myopic probe best policy.
Theorem 9. For all fixed k, the round robin policy has a per-slot average throughput
of
E[Thpt] = π +
πpk10
,
k(p + q)(pk10 + π)
(2.46)
the same as the probe best policy.
Proof. Let a renewal occur every time a new channel is probed and found to be ON.
Since the result of each probe is an i.i.d. random variable with parameter π, the
inter-renewal intervals are i.i.d. The inter-renewal time XRR = k · N , where k is the
time between probes, and N is a geometric random variable with parameter π, as
defined in (2.4). Over that interval, the transmitter transmits over the last channel
known to be ON, until a new ON channel is found. The expected reward earned over
each renewal period is given by
secondRRR
NX
∗k−1
i
=E
p11
(2.47)
i=0
k
pN
10
= E πN k +
p+q
pk10
=k+
.
p + q − q(1 − p − q)k
(2.48)
(2.49)
Thus, the time-average reward is given by
πpk10
secondRRR
=π+
,
secondXRR
k(p + q)(π + pk10 )
(2.50)
which is the same as the reward of the probe best policy in Theorem 6.
Recall from Theorem 5, that under the probe best policy, at most one channel can
have belief greater than π. In contrast, under the round robin policy many channels
can have belief greater than π. Thus, Theorem 9 is surprising, since the round robin
policy trades off immediate reward for increasing knowledge of the channel states,
but yields the same average throughput as the probe best policy.
58
Policy
Probe Best
Probe Second-Best
Round Robin
Theory Simulation
0.7659
0.7657
0.7806
0.7806
0.7659
0.7662
Table 2.4: Throughput comparison for different probing policies with p = q = 0.05, k = 6.
Simulation assumes 500 channels and a time horizon of 1,000,000 probes.
2.4.4
Simulation Results
In order to simulate a infinite-channel system, we consider a system of 500 channels
and apply different probing policies at a fixed probing interval of 6 slots. We compute
the average throughput obtained over the total horizon, as shown in Table 2.4. In this
simulation, the probe second-best policy is optimal over all policies considered, while
the probe best policy and round robin policies have the same average throughput.
Additionally, we can see that the analytical throughput derived in Section 2.4 is very
close to that observed through simulation.
2.5
Dynamic Optimization of Probing Intervals
Until this point, we have assumed the transmitter chooses channels to probe at predetermined probing intervals. However, an alternate approach is to optimize the time
until the next channel probe dynamically, as a function of the collected CSI. For example, after an ON probe, the transmitter has knowledge of a channel yielding high
throughput, and therefore may not need to probe a new channel immediately. On the
other hand, if that probed channel is OFF, the transmitter may benefit from probing
a new channel in the near future to make up for lost throughput. In this example, the
optimal probing policy sets the probing interval dynamically, based on the results of
the previous probe. In this section, the optimal dynamic probing policy is modeled
as a stochastic control problem, where at each time slot, a decision is made whether
to probe a channel or not, and if so, which channel to probe.
59
2.5.1
Two-Channel System
To begin with, consider a system with only two channels. The optimal channel
probing problem is formulated as a Markov Decision Process (MDP) or a DP over a
finite horizon of length T . At each time slot, the system state is the vector consisting
of the belief of each channel’s state. After observing the system state at time t, the
transmitter selects an action from a set of possible actions: probe channel 1, probe
channel 2, or probe neither channel. Thus, the expected reward function at time slot
t is given by
Jt (x1 , x2 ) = max Jt0 (x1 , x2 ), Jt1 (x1 , x2 ), Jt2 (x1 , x2 ) ,
(2.51)
where Jt0 is the expected reward given that neither channel is probed at the current
slot, and Jt1 and Jt2 are the expected reward functions given that channel 1 or channel
2 is probed respectively. When the transmitter chooses to not probe either channel,
the throughput obtained is given by the maximum of the channel beliefs, since the
transmitter uses the better of the two channels. Assume channel probes incur a
cost of c. This channel cost represents the resources required to execute a channel
probe, thus taking away from resources which could have been used for additional
throughput. When a channel is probed and is ON, the transmitter uses that channel
and a reward (throughput) of 1 is earned. On the other hand, if the probed channel
is OFF, a unit throughput is earned only if the second channel is ON. Therefore, the
terminal cost at time t = T is given by
JT0 (x1 , x2 ) = max(x1 , x2 ),
(2.52)
JT1 (x1 , x2 ) = −c + x1 + (1 − x1 )x2 ,
(2.53)
JT2 (x1 , x2 ) = −c + x2 + (1 − x2 )x1
(2.54)
For t < T , the reward function includes the expected future reward, based on the
result of the channel probe. If the transmitter does not probe a channel, the state at
the next slot is given by (τ (x1 ), τ (x2 )), where τ (·) = τ 1 (·) is the information decay
60
function in (2.5). If a channel is probed, then the belief of that channel in the following
slot is either p or 1 − q depending on whether the probe results in an OFF channel or
an ON channel respectively. Thus, the recursive expected reward DP equations are
given by
Jt0 (x1 , x2 ) = max(x1 , x2 ) + Jt+1 τ (x1 ), τ (x2 )
(2.55)
Jt1 (x1 , x2 ) = −c + x1 + x2 − x1 x2
+ x1 Jt+1 1 − q, τ (x2 ) + (1 − x1 )Jt+1 p, τ (x2 )
(2.56)
Jt2 (x1 , x2 ) = −c + x1 + x2 − x1 x2
+ x2 Jt+1 τ (x1 ), 1 − q + (1 − x2 )Jt+1 τ (x1 ), p
(2.57)
The maximizer of (2.51) is the optimal probing policy at time slot t as a function of
the current state. Note that the state space is countably infinite, as each belief xi
has a one-to-one mapping to an (S, k) pair, where S is the state at the last channel
probe, and k is the time since the last probe.
Several observations can be made about the value function described in (2.51)(2.57), as stated through the following lemmas.
Lemma 4 (Linearity). Jt1 (x1 , x2 ) is linear in x1 for fixed x2 , and similarly, Jt2 (x1 , x2 )
is linear in x2 for fixed x1 .
Proof. We will prove the first half of this lemma here, and the other half follows by
symmetry. Let 0 ≤ λ ≤ 1.
Jt1 (λx1 + (1 − λ)y1 , x2 )
= −c + λx1 + (1 − λ)y1 + x2 − (λx1 + (1 − λ)y1 )x2
+ (λx1 + (1 − λ)y1 )Jt+1 1 − q, τ (x2 ) + (1 − (λx1 + (1 − λ)y1 ))Jt+1 p, τ (x2 )
(2.58)
= λ(−c + x1 − x1 x2 ) + (1 − λ)(−c + y1 − y1 x2 )
+ λ x1 Jt+1 1 − q, τ (x2 ) + (1 − x1 )Jt+1 p, τ (x2 )
61
+ (1 − λ) y1 Jt+1 1 − q, τ (x2 ) + (1 − y1 )Jt+1 p, τ (x2 )
(2.59)
= λJt1 (x1 , x2 ) + (1 − λ)Jt1 (y1 , x2 )
(2.60)
Lemma 5 (Commutativity).
Jt (x1 , x2 ) = Jt (x2 , x1 )
(2.61)
Proof. This proof is by reverse induction on t. For T , we have
max(x1 , x2 ), −c + x1 + x2 − x1 x2 , −c + x2 + x1 − x2 x1
JT (x1 , x2 ) = max
= max
(2.62)
max(x2 , x1 ), −c + x2 + x1 − x2 x1 , −c + x1 + x2 − x1 x2
= JT (x2 , x1 )
(2.63)
(2.64)
Now assume (2.61) holds for time t + 1. Then we have
Jt1 (x1 , x2 ) = −c + x1 + x2 − x1 x2 + x1 Jt+1 1 − q, τ (x2 ) + (1 − x1 )Jt+1 p, τ (x2 )
= −c + x2 + x1 − x2 x1 + x1 Jt+1 τ (x2 ), 1 − q + (1 − x1 )Jt+1
(2.65)
τ (x2 ), p
(2.66)
= Jt2 (x2 , x1 )
(2.67)
Additionally, we have
Jt0 (x1 , x2 ) = max(x1 , x2 ) + Jt+1 τ (x1 ), τ (x2 )
= max(x2 , x1 ) + Jt+1 τ (x2 ), τ (x1 )
= Jt0 (x2 , x1 )
62
(2.68)
(2.69)
(2.70)
Finally, we can use these two results to show that
Jt (x1 , x2 ) = max Jt0 (x1 , x2 ), Jt1 (x1 , x2 ), Jt2 (x1 , x2 )
= max Jt0 (x2 , x1 ), Jt2 (x2 , x1 ), Jt1 (x2 , x1 )
(2.71)
(2.72)
= Jt (x2 , x1 )
(2.73)
The proof follows by induction.
Let Φt (0), Φt (1), Φt (2) be the sets of (x1 , x2 ) such that it is optimal to not probe,
probe channel 1, and probe channel 2 respectively at time t.
Lemma 6 (Probe Symmetry). If (x1 , x2 ) ∈ Φt (1), then (x2 , x1 ) ∈ Φt (2).
Proof. If (x1 , x2 ) ∈ Φt (1), then Jt1 (x1 , x2 ) ≥ Jt2 (x1 , x2 ) and Jt1 (x1 , x2 ) ≥ Jt0 (x1 , x2 ).
Using Lemma 5, we can then say that Jt2 (x2 , x1 ) ≥ Jt1 (x2 , x1 ) and Jt2 (x2 , x1 ) ≥
Jt0 (x2 , x1 ) which implies (x2 , x1 ) ∈ Φt (2).
Lemma 7 (No-Probe Symmetry). If (x1 , x2 ) ∈ Φt (0), then (x2 , x1 ) ∈ Φt (0).
Proof. If (x1 , x2 ) ∈ Φt (0), then Jt0 (x1 , x2 ) ≥ Jt1 (x1 , x2 ) and Jt0 (x1 , x2 ) ≥ Jt2 (x1 , x2 ). It
follows from Lemma 5 that Jt0 (x1 , x2 ) = Jt0 (x2 , x1 ) and Jt1 (x1 , x2 ) = Jt1 (x2 , x1 ) which
implies Jt0 (x2 , x1 ) ≥ Jt1 (x2 , x1 ). By a similar argument, we can show Jt0 (x2 , x1 ) ≥
Jt2 (x2 , x1 ), and therefore (x2 , x1 ) ∈ Φt (0).
These last two lemmas show that the optimal decision regions are symmetric about
the line x1 = x2 .
Lemmas (4)-(7) combine to prove a convexity result on the expected reward function.
Theorem 10 (Convexity). For all t, Jt (x1 , x2 ) is convex in x1 for fixed x2 , and is
convex in x2 for fixed x1 .
Proof. This is proved by reverse induction over t. For t = T ,
JT (x1 , x2 ) = max
max(x1 , x2 ), −c + x1 + x2 − x1 x2 , −c + x2 + x1 − x2 x1
63
(2.74)
is convex in each element since each argument to the maximum is convex (or affine)
and the maximum of convex functions is also convex. Now consider t < T , and we
assume that Jt+1 (x1 , x2 ) is convex in x1 for fixed x2 . To begin with, we note that
τ (λx1 + (1 − λ)y1 ) = (1 − q)(λx1 + (1 − λ)y1 ) + p(1 − λx1 − (1 − λ)y1 )
(2.75)
= (1 − q)λx1 + pλ(1 − x1 ) + (1 − q)(1 − λ)y1 + p(1 − λ)(1 − y1 )
(2.76)
= λτ (x1 ) + (1 − λ)τ (y1 )
(2.77)
First we consider the expected throughput after not probing.
Jt0 (λx1 + (1 − λ)y1 , x2 ) = max(λx1 + (1 − λ)y1 , x2 ) + Jt+1 τ (λx1 + (1 − λ)y1 ), τ (x2 )
(2.78)
≤ λ(max(x1 , x2 )) + (1 − λ)(max(y1 , x2 ))
+ Jt+1 λτ (x1 ) + (1 − λ)τ (y1 ), τ (x2 )
(2.79)
≤ λ(max(x1 , x2 )) + (1 − λ)(max(y1 , x2 ))
+ λJt+1 τ (x1 ), τ (x2 ) + (1 − λ)Jt+1 τ (y1 ), τ (x2 )
(2.80)
= λ(Jt0 (x1 , x2 )) + (1 − λ)(Jt0 (y1 , x2 ))
(2.81)
where (2.79) holds by the convexity of max(x, ·) and the linearity of τ (·), and (2.80)
holds from the induction hypothesis. Additionally, Jt1 (x1 , x2 ) is convex in x1 by lemma
4. For Jt2 (x1 , x2 ), we have:
Jt2 (λx1 + (1 − λ)y1 , x2 )
= −c + λx1 + (1 − λ)y1 + x2 − (λx1 + (1 − λ)y1 )x2
+ x2 Jt+1 τ (λx1 + (1 − λ)y1 ), 1 − q + (1 − x2 )Jt+1 τ (λx1 + (1 − λ)y1 ), p
(2.82)
= λ(−c + x1 + x2 − x1 x2 ) + (1 − λ)(−c + y1 + x2 − y1 x2 )
64
+ x2 Jt+1 λτ (x1 ) + (1 − λ)τ (y1 ), 1 − q + (1 − x2 )Jt+1 λτ (x1 ) + (1 − λ)τ (y1 ), p
(2.83)
≤ λ(−c + x1 + x2 − x1 x2 ) + (1 − λ)(−c + y1 + x2 − y1 x2 ) + λ x2 Jt+1 τ (x1 ), 1 − q
+ (1 − x2 )Jt+1 τ (x1 ), p + (1 − λ) x2 Jt+1 τ (y1 ), 1 − q + (1 − x2 )Jt+1 τ (y1 ), p
(2.84)
= λ(Jt2 (x1 , x2 )) + (1 − λ)(Jt2 (y1 , x2 ))
(2.85)
Thus, each of Jt0 (x1 , x2 ), Jt1 (x1 , x2 ), and Jt2 (x1 , x2 ) is convex in x1 for fixed x2 , and
therefore Jt (x1 , x2 ) is convex in x1 as well. The second half of the proof statement
holds by symmetry.
Using the convexity of the expected reward function, we can find sufficient conditions for probing optimality for a given state.
Theorem 11. If for any time slot t, the system state (x1 (t), x2 (t)) satisfies
c ≤ min(x1 (t), x2 (t)) 1 − max(x1 (t), x2 (t))
(2.86)
Then it is optimal to probe at slot t.
Proof.
Jt0 (x1 , x2 ) = max(x1 , x2 ) + Jt+1 τ (x1 ), τ (x2 )
(2.87)
≤ max(x1 , x2 ) + x1 Jt+1 1 − q, τ (x2 ) + (1 − x1 )Jt+1 p, τ (x2 )
(2.88)
= max(x1 , x2 ) + Jt1 (x1 , x2 ) + c − x1 − x2 + x1 x2
(2.89)
Where (2.88) follows from Theorem 10. Therefore, Jt0 (x1 , x2 ) − Jt1 (x1 , x2 ) ≤ 0 if
c − x1 − x2 + x1 x2 + max(x1 , x2 ) ≤ 0
c ≤ min(x1 , x2 ) 1 − max(x1 , x2 )
65
(2.90)
(2.91)
While the convexity bound yields sufficient conditions for probing optimality, necessary conditions do not follow directly from this analysis. Additionally, the convexity
bound used in (2.88) is loose, and thus probing is often optimal even in states which
do not satisfy the conditions of Theorem 11.
2.5.2
State Action Frequency Formulation
The channel probing MDP can also be modeled as an infinite horizon, average cost
problem. In this case, it can be formulated as a linear program (LP) in terms of
state action frequencies, which can be solved to determine the optimal policy. A
state action frequency vector ω(s; a) exists for each state and potential action, and
corresponds to a stationary randomized policy such that ω(s; a) equals the steady
state probability that at a given time slot, the state is s and the action taken is
a. Let s = (s1 , k1 , s2 , k2 ), where s1 and s2 are the last known states of the two
channels respectively, and k1 and k2 are the respective times since the last probe on
each channel. We use this notation rather than the belief notation to emphasize the
countable nature of the state space. Furthermore, the action a satisfies a ∈ {0, 1, 2},
representing the actions of not probing, probing channel 1, and probing channel 2
respectively.
As mentioned above, the state space is countably infinite and therefore the resulting state action frequency LP is intractable; however, we can approximate the
optimal solution by truncating the state space at a large finite value. In particular,
assume that ki takes values between 0 and Kmax , where Kmax is a predefined constant.
When ki = Kmax , and channel i is not probed, then let ki = Kmax at the next slot as
max
well. Clearly, as Kmax increases, pK
approaches π, and the truncated formulation
11
approaches the countable state space formulation. Since the belief of each channel
approaches steady state exponentially fast, this truncation method can be used to
find a near-optimal solution to the stochastic control problem. See [31] for details.
The state action frequency formulation is presented in (2.92)-(2.103). Equation
66
Max.
X
X
ω(s1 , k1 , s2 , k2 ; a)r(s1 , k1 , s2 , k2 ; a)
(2.92)
ω(s1 , k1 , s2 , k2 ; a) = 1
(2.93)
a s1 ,s2 ,k1 ,k2
s.t.
X
X
a s1 ,s2 ,k1 ,k2
X
ω(s1 , 1, s2 , k2 ; a) =
K
max X
X
ω(s01 , k1 , s2 , k2 − 1; 1)pks01,s1
1
k1 =1 s01
a
∀s1 , s2 , 2 ≤ k2 ≤ Kmax − 1
X
ω(s1 , k1 , s2 , 1; a) =
K
max X
X
(2.94)
ω(s1 , k1 − 1, s02 , k2 ; 2)pks02,s2
2
k2 =1 s02
a
∀s1 , s2 , 2 ≤ k1 ≤ Kmax − 1
X
(2.95)
ω(s1 , 1, s2 , Kmax ; a)
a
=
K
max X
X
k1 =1
s01
pks01,s1 ω(s01 , k1 , s2 , Kmax − 1; 1) + ω(s01 , k1 , s2 , Kmax ; 1)
1
∀s1 , s2
(2.96)
X
ω(s1 , Kmax , s2 , 1; a)
a
=
K
max X
X
k2 =1 s02
pks02,s2 ω(s1 , Kmax − 1, s02 , k2 ; 2) + ω(s1 , Kmax , s02 , k2 ; 2)
2
∀s1 , s2
(2.97)
X
ω(s1 , k1 , s2 , k2 ; a) = ω(s1 , k1 − 1, s2 , k2 − 1; 0)
a
∀s1 , s2 , 2 ≤ k1 , k2 ≤ Kmax
X
(2.98)
ω(s1 , Kmax , s2 , k2 ; a) = ω(s1 , Kmax − 1, s2 , k2 − 1; 0)
a
+ ω(s1 , Kmax , s2 , k2 − 1; 0) ∀s1 , s2 , 2 ≤ k2 ≤ Kmax
X
ω(s1 , k1 , s2 , Kmax ; a) = ω(s1 , k1 − 1, s2 , Kmax − 1; 0)
(2.99)
a
+ ω(s1 , k1 − 1, s2 , Kmax ; 0) ∀s1 , s2 , 2 ≤ k1 ≤ Kmax
X
ω(s1 , Kmax , s2 , Kmax ; a) = ω(s1 , Kmax , s2 , Kmax ; 0)
(2.100)
a
+ ω(s1 , Kmax − 1, s2 , Kmax − 1; 0) + ω(s1 , Kmax − 1, s2 , Kmax ; 0)
+ ω(s1 , Kmax , s2 , Kmax − 1; 0) ∀s1 , s2
r(s1 , k1 , s2 , k2 ; a) = −c + pks11,1 + pks22,1 − pks11,1 pks22,1 ∀a ∈ {1, 2}
r(s1 , k1 , s2 , k2 ; 0) = max pks11,1 , pks22,1
67
(2.101)
(2.102)
(2.103)
(2.92) is the objective, maximizing the average reward, where the reward functions are
defined for each possible action in (2.102) and (2.103). Equation (2.93) is a normalization constraint, ensuring that the state action frequencies sum to one. Equations
(2.98) through (2.101) are balance equations for the case when the chosen action is
to not probe. Note that we include constraints to deal with the truncation of the
state space. Equations (2.94) and (2.96) deal with the evolution of the state when
channel 1 is probed, and equations (2.95) and (2.97) deal with the case when channel
2 is probed.
For weakly communicating finite state and action MDP’s, there exists a solution
to the state action frequency LP that corresponds to a deterministic stationary policy
[31]. Specifically, for all recurrent states s in the solution, the state action frequencies
ω(s; a) > 0 for some a, and since the optimal policy is deterministic, ω(s; a) > 0
is satisfied for only one value of a, which is the optimal decision at that state, and
ω(s; a) = 0 for all other actions. Since transient states are only visited finitely often,
they have zero state action frequencies for every action. Note, solving the SAF LP
with a simplex-based solver, e.g. CPLEX, returns the deterministic solution.
The solution to the state action frequency LP for sample parameters is shown in
Figure 2-11. This plot shows the optimal decision as a function of the belief of channel
1 (x1 ) and the belief of channel 2 (x2 ). The system state can only reach a countable
subset of the points on the x1 -x2 plane. Under any policy, except for the policy where
a channel is never probed, there is a single recurrent class of states, and only states
in this class will have non-zero state action frequencies. From any recurrent state,
if the optimal decision is not to probe, the system state will move to the next state
(τ (x1 ), τ (x2 )). The coordinates (τ k (x1 ), τ k (x2 )) represent a line between (x1 , x2 ) and
(π, π) parameterized by k. Thus, while the transmitter refrains from probing, the
system state follows a trajectory between the current state to (π, π). Based on this
observation, and the results in Figure 2-11, we can characterize the structure of the
optimal probing algorithm.
For a given set of parameters, there exists a probing-region, e.g. the dotted convex
region in Figure 2-11, and a point (π, π), denoted by the dot in the center of Figure
68
p=0.03, q=0.03, C=0.5
1
0.9
0.8
Belief of Channel 2
0.7
0.6
(π,π)
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
Belief of Channel 1
0.7
0.8
0.9
1
Figure 2-11: Optimal decisions based on SAFs. White space corresponds to transient
states under the optimal policy, and green circles, red boxes, and blue stars correspond
to recurrent states where the optimal action is to not probe, probe channel 1, and probe
channel 2 respectively.
69
2-11. At each time slot, if the current state lies outside of the probing region, the
optimal decision is to not probe, and the state moves along the linear trajectory
toward (π, π). When the state lies on or inside the probing region, the controller
probes one of the channels. The state reached after the channel probe corresponds
to a point on the edge of the unit square in Figure 2-11, since the belief of a probed
channel is either 0 or 1. Then the process repeats, and the state will follow a new
trajectory toward the point (π, π). Therefore, the region for which probing is optimal
translates to a threshold policy, where probing becomes optimal after a certain time,
given by the distance between the point on the edge of the unit square, and the
probing region.
If the point (π, π) lies outside of the probing region, then there exists a trajectory
to (π, π) that does not intersect the probing region. If this is the case, the state after a
channel probe will eventually be a point on the unit square such that the line between
that point and (π, π) does not cross the probing optimality region, and the optimal
decision is to never probe and the state monotonically approaches (π, π) along the
linear trajectory. In this situation, all states are transient under the optimal policy.
In summary, the optimal time between probes is given by the distance between the
state immediately following a probe and the state on the boundary of the probing
region, lying on the line between the current state and (π, π). To find the probing
region, and the decisions to make at each point on the probing regions, the SAF LP
in (2.92)-(2.103) must be solved.
2.5.3
Infinite-Channel System
For a system with more than two channels, the previous approaches can be used to
formulate the problem of finding the optimal probing intervals. The drawback of these
approaches is that the state space grows exponentially with the number of channels,
and it becomes impractical to solve the MDP approach in Section 2.5.1 and the state
action frequency LP in section 2.5.2. However, in the asymptotic limit of the number
of channels, the infinite channel assumption in Section 2.4 can be applied to greatly
simplify the state space, and new approaches can be developed to characterize the
70
optimal probing intervals. The optimal intervals are related to the underlying probing
policy used to select the channels to probe. In this section, we consider two of the
channel probing policies from Section 2.4: the probe best policy and the round robing
policy, and characterize the optimal intervals at which to probe.
To begin, assume the decision of which channel to probe is given by the probe-best
policy. The optimal probing interval is characterized by the following theorem.
Theorem 12. For a system in which the transmitter only probes the channel with the
highest belief, the optimal probing decision is to probe immediately after probing an
OFF channel, and to probe k ∗ slots after probing an ON channel, where k ∗ is given
by
1
πpk10
k
k = arg max
− c(π + p10 )
kπ + pk10 (p + q)
k
∗
(2.104)
Proof. As a result of Theorem 5, under the probe best policy, the belief of the best
channel x1 at every slot satisfies x1 ≥ π, and the belief of every other channel equals
π. When a probed channel is OFF, it is removed from the system, and the belief of
every channel is π, representing a state in which the transmitter has no knowledge of
the system. The system remains in this state until an ON channel is found, as each
OFF channel which is probed is removed from the system. If the optimal decision
in this state is to not probe, then the transmitter never probes, since the state never
changes. Thus, if it is optimal to probe in the state where the transmitter has no
knowledge, then it is optimal to probe immediately after an OFF channel is probed.
When a probed channel is ON, the highest belief is always 1 − q in the next slot, and
decays until that channel is probed again, as it will always remain the channel with
the highest belief. Hence, there exists a threshold k ∗ after an ON probe such that
after that time, it becomes optimal to probe.
Assume a probe occurs in the slot immediately after probing an OFF channel, and
let k denote the number of slots after probing an ON channel until the best channel
is probed again. Define a renewal to occur when the transmitter probes an OFF
channel. It follows that the inter-renewal time is one slot if the next probed channel
is OFF, and 1 + kN if the probed channel is ON, where N is a random variable
71
equal to the number of times the ON channel is probed until it turns OFF. Thus, the
expected inter-renewal time is given by
X̄B = (1 − π) + π(1 + kE[N ])
= 1 + πkE[N ],
(2.105)
(2.106)
The random variable N is is geometrically distributed with parameter pk10 . The
reward accumulated over this interval is π if the probed channel is OFF, and N times
Pk−1 i
i=0 p11 if the channel is ON, plus an additional π after the final OFF probe. A cost
of c is incurred for each channel probe within this interval. The expected reward is
given by
k−1
X
i
R̄B = (1 − π)(π − c) + π E[N ](
p11 − c) + π − c
(2.107)
i=0
k−1
X
= (π − c) + πE[N ](
pi11 − c).
(2.108)
i=0
Therefore, the average per-time-slot reward is given by the ratio of expected reward over a renewal interval to the expected length of the renewal interval:
P
i
pk10 (π − c) + π( k−1
R̄B
i=0 p11 − c)
=
pk10 + kπ
X̄B
π + pk10
πpk10
=π−c
+
kπ + pk10
(p + q)(kπ + pk10 )
(2.109)
(2.110)
The maximizing value of k in equation (2.110) is the optimal time k ∗ to wait after an
ON probe.
Theorem 12 characterizes the optimal probing interval under the probe best policy.
If the probing policy changes, the optimal interval changes as well. Nevertheless, the
following result shows that under the round-robin policy, the optimal probing interval
has a similar structure.
Theorem 13. For a system in which the transmitter probes channels according to the
round robin policy, the optimal decision is to probe a new channel immediately after
72
probing an OFF channel, and to probe k 0 slots after probing an ON channel, where k 0
is given by
P
−2 i
−c(p + q) + pEN [ k+N
p11 ]
i=0
k = arg max
p(k − 1) + p + q
k
0
(2.111)
where N is a geometrically distributed random variable with parameter π.
Proof. In contrast to Theorem 12, there is no analog to Theorem 5 for round robin
probing. Thus, we first prove the optimal form of the policy is a threshold policy,
by proving the monotonicity of the expected reward function. Given the structure of
the optimal policy, renewal theory is applied to characterize the optimal interval. To
begin, we can write the expected reward for probing k slots after the previous probe
as a function of k over a finite horizon.
k
k
JT (k) = max p11 , −c + π + (1 − π)p11
(2.112)
k
k
Jt (k) = max p11 + Jt+1 (k + 1), −c + π 1 + Jt+1 (1) + (1 − π) p11 + Jt+1 (k + 1)
(2.113)
where the left argument to the max(·, ·) function represents the expected reward from
not probing, and the right argument represents the expected reward from probing an
unknown channel.
Under round robin probing, Jt is monotonically decreasing in k for all t. To see
this, assume t = T , then assume k satisfies πpk10 ≥ c, then
k
k
JT (k) = max p11 , −c + π + (1 − π)p11 = pk11 + max(0, −c + πpk10 )
= pk11 − c + π(1 − pk11 ) = pk11 (1 − π) + π − c
(2.114)
(2.115)
which is monotonically decreasing in k, since pk11 is a monotonically decreasing function of k. If on the other hand πpk10 ≤ c, then JT (k) = pk11 which is monotonically
decreasing in k.
Now assume t ≤ T , and the hypothesis holds for t + 1, . . . , T , we will show using
73
induction that it holds for t. Let g(k) = pk11 + Jt+1 (k + 1). By induction, g(k)
is monotonically decreasing in k, and using the analysis from the base case, the
expression
Jn (k) = max g(k), −c + π 1 + Jn+1 (1) + (1 − π)g(k)
(2.116)
is also monotonically decreasing in k.
The remaining proof of Theorem 13 follows by reverse induction over the time
horizon. Assume there is a k 0 such that it is optimal to probe at time T . Consider
0
k ≥ k 0 . It is optimal to probe if c ≤ πpk10 . However, c ≤ πpk10 since it is optimal to
0
probe at k 0 , and pk10 ≥ pk10 . Therefore, it is also optimal to probe at k.
Now consider t ≤ T , and assume our induction hypothesis holds for t + 1. The
difference in the arguments to max(·, ·) in (2.113) can be bounded as follows
−c + π(1 + Jt+1 (1)) + (1 − π)(pk11 + Jt+1 (k + 1)) − pk11 − Jt+1 (k + 1)
= −c + π(1 + Jt+1 (1)) + −π(pk11 + Jt+1 (k + 1))
∗
(2.117)
(2.118)
≥ −c + π(1 + Jt+1 (1)) + −π(pk11 + Jt+1 (k ∗ + 1))
(2.119)
≥ 0.
(2.120)
where the first inequality holds from the monotonic property of the J function, and
the second inequality holds from the assumption that it is optimal to probe for k 0 .
Therefore, it is optimal to probe at t, and by induction, it is optimal to probe k 0 slots
after an ON probe for some value of k 0 .
To characterize the optimal value of k 0 , we introduce renewal theory using the
renewals defined in Section 2.4.3. Recall, a renewal occurs upon probing a channel
which is ON. The expected time until the next renewal is the k 0 slots until the next
probe, plus the number of slots it takes to find a new ON channel. Let N be the
number of probes until an ON channel is found, which is geometrically distributed
74
with parameter π. The expected inter-renewal time is given by
X̄R = EN [k + N − 1].
(2.121)
Over this interval, a cost of c is incurred for each of the N channel probes, and at each
time slot the transmitter uses the last known ON channel for transmission. Thus, the
expected reward is given by
R̄R = EN 1 − N c +
k+N
X−2
pi11
.
(2.122)
i=0
To determine the optimal k 0 , we maximize the ratio of the expected reward to the
expected length of the renewal interval, thus concluding the proof.
Note that the optimal time to wait to probe after an ON probe under round robin
(k 0 ) in (2.111) differs from the optimal k ∗ under the probe best policy in (2.104).
Figure 2-13 plots the average reward of round robin and probe best for different values
of k. Recall that under fixed probing intervals, Theorem 9 states that both policies
have the same average reward. However, under dynamic probing intervals, the probe
best policy outperforms the round robin policy. Figure 2-12 shows a comparison
between expected throughput of the optimal fixed-interval probing policy and the
optimal dynamic-interval policy under probe best and round robin. By looking at
the maxima in these graphs, we observe that for the chosen parameters, introducing a
dynamic probing-interval optimization yields an 8% gain in throughput under probe
best, and a 5% gain in throughput under round robin.
Based on the results of the fixed probing interval model, a natural extension to
the above analysis is to consider the probe second-best policy, which was conjectured
to be the optimal probing policy under fixed channel probing intervals. In contrast to
probe best and round robin, the optimal time until the next probe under the probe
second-best policy depends on the belief of the best channel after an ON channel is
probed, and consequently, probe second-best does not have a single solution for the
optimal probing interval after an ON channel has been probed. Thus, characterizing
75
(a) Probe Best Probing Policy
(b) Round Robin Probing Policy
Figure 2-12: Comparison of the expected throughput of the probe best policy and the
round robin policy under fixed intervals and under dynamic intervals. The x-axis plots k,
the length of the interval. The maxima of each graph represents the optimal policy in each
regime. In this example, p = q = 0.05 and c = 0.5.
76
Figure 2-13: Comparison of the probe best policy and round robin for varying values of k,
the minimum interval between probes. In this example, p = q = 0.1, and c = 0.5.
the optimal probing intervals is a more challenging problem in this context. It is
an interesting and open problem to determine if the probe second-best policy is still
optimal under dynamic probing intervals.
2.6
Conclusion
This chapter focuses on channel probing as a means of acquiring network state information, and optimizes the acquisition of this information in terms of which channels
to probe and how often to probe these channels. In contrast to the work in [2, 71]
that established the optimality of the myopic probe best policy, we showed that for
a slightly modified model, these results no longer hold. Under a two channel system,
we proved that probing either channel results in the same throughput, and under
an infinite channel system, we proved that a simple alternative, the probe secondbest policy, outperforms the probe best policy in terms of average throughput. We
proved the optimality of the probe second-best policy in three channel systems, and
conjecture that probing the second-best channel is the optimal decision in a general
multi-channel system. Proving this conjecture is interesting, and remains an open
77
problem.
Additionally, we showed that dynamically optimizing the probing intervals based
on the results of the channel probe can additionally increase system throughput. We
characterized the optimal probing intervals in a two channel system by formulating
a Markov decision problem, and using a state action frequency approach to solve
the dynamic program. For the infinite channel case, we characterized the optimal
probing intervals subject to a fixed probing policy, namely the probe best policy and
the round robin probing policy. An extension to general probing polices, as well
as a joint optimization over the probing decisions and the probing intervals is an
interesting extension to this work.
2.7
2.7.1
Appendix
Proof of Lemma 1
Lemma 1: f k (x1 , x2 ) = f k (x2 , x1 )
Proof of Lemma 1.
k
f (x2 , x1 ) = x2
=
=
k−1
X
pi11
+ (1 − x2 )
i=0
k−1
X
i=0
k−1
X
k−1
X
τ i (x1 )
i=0
x2 pi11 + (1 − x2 )τ i (x1 )
(2.123)
x2 pi11 + (1 − x2 )(xi pi11 + (1 − x1 )pi01 )
(2.124)
x1 pi11 + (1 − x1 )(xi pi11 + (1 − x2 )pi01 )
(2.125)
x1 pi11 + (1 − x1 )τ i (x2 ) = f k (x1 , x2 )
(2.126)
i=0
=
=
k−1
X
i=0
k−1
X
i=0
Equation (2.124) follows from (2.5), and (2.126) follows from (2.7).
78
2.7.2
Proof of Theorem 1
Theorem 1: For a two-user system with independent channels evolving over time
according to an ON/OFF Markov chain with transition probabilities p and q, and
probing epochs fixed at intervals of k slots, then for each channel probe, the total
reward from probing channel 1 is equal to that of probing channel 2.
Proof of Theorem 1. This proof uses reverse induction on the probing index n. As a
base case, consider n = N .
JN1 (x1 , x2 ) = f k (x1 , x2 ) = f k (x2 , x1 ) = JN2 (x1 , x2 )
(2.127)
1
2
Now assume Jn+1
(x1 , x2 ) = Jn+1
(x1 , x2 ), and we prove this holds for index n.
First, we note that the function f k (x1 , x2 ) is affine in both x1 and x2 . To see this,
consider 0 ≤ λ ≤ 1.
λf k (a, x2 ) + (1 − λ)f k (b, x2 )
k−1 X
i
i
i
i
=
λap11 + λ(1 − a)τ (x2 ) + (1 − λ)bp11 + (1 − λ)(1 − b)τ (x2 )
=
i=0
k−1
X
pi11 (λa
+ (1 − λ)b) + τ (x2 ) λ(1 − a) + (1 − λ)(1 − b)
i
(2.128)
(2.129)
i=0
= f k (λa + (1 − λ)b, x2 )
(2.130)
As a consequence of Lemma 1, it also follows that
λf k (x2 , a) + (1 − λ)f k (x1 , b) = f k (x1 , λa + (1 − λ)b)
(2.131)
1
2
Using the above fact, we can show that both Jn+1
and Jn+1
are affine as well.
1
1
(a, x2 ) + (1 − λ)Jn+1
(b, x2 )
λJn+1
k
k
k
k
k
= λf (a, x2 ) + λ aJn+2 (p11 , τ (x2 )) + (1 − a)Jn+2 (p01 , τ (x2 ))
79
k
k
k
k
+ (1 − λ)f (b, x2 ) + (1 − λ) bJn+2 (p11 , τ (x2 )) + (1 − b)Jn+2 (p01 , τ (x2 ))
k
(2.132)
= f k (λa + (1 − λ)b, x2 ) + (λa + (1 − λ)b)Jn+2 (pk11 , τ k (x2 ))
+ (1 − λa − (1 − λ)b)(pk01 , τ k (x2 ))
(2.133)
1
(λa + (1 − λ)b, x2 )
= Jn+1
(2.134)
1
2
By the induction hypothesis, since Jn+1
(x1 , x2 ) = Jn+1
(x2 , x1 ), it is easy to show that
2
Jn+2
is affine in x2 as well.
Using the results above, Jn1 (x1 , x2 ) is written as
Jn1 (x1 , x2 ) = f k (x1 , x2 ) + x1 Jn+1 (pk11 , τ k (x2 )) + (1 − x1 )Jn+1 (pk01 , τ k (x2 ))
(2.135)
1
1
= f k (x1 , x2 ) + x1 Jn+1
(pk11 , τ k (x2 )) + (1 − x1 )Jn+1
(pk01 , τ k (x2 ))
(2.136)
1
= f k (x1 , x2 ) + Jn+1
(τ k (x1 ), τ k (x2 ))
(2.137)
2
= f k (x1 , x2 ) + Jn+1
(τ k (x1 ), τ k (x2 ))
(2.138)
2
2
= f k (x2 , x1 ) + x2 Jn+1
(τ k (x1 ), pk11 ) + (1 − x2 )Jn+1
(τ k (x1 ), pk01 )
(2.139)
= f k (x2 , x1 ) + x2 Jn+1 (τ k (x1 ), pk11 ) + (1 − x2 )Jn+1 (τ k (x1 ), pk01 )
(2.140)
= Jn2 (x1 , x2 )
(2.141)
where equations (2.136), (2.138), and (2.140) follow from the induction hypothesis,
i
and equations (2.137) and (2.139) use the affinity of Jn+1
, and Lemma 1.
2.7.3
Proof of Theorem 3
Theorem 3: For a two-user system with channel states evolving as described above,
and probing instances fixed to intervals of k slots, if p1 , p2 , q1 , q2 satisfy
bi11 ≥ ai11
80
∀i,
(2.142)
then, the optimal probing strategy is to probe channel 2 at all probing instances.
Proof of Theorem 3. This proof will use induction on the horizon length of the corresponding DP problem.
Define state transition functions
τ1i (x) = ai11 x + (1 − x)ai01
(2.143)
τ2i (x) = bi11 x + (1 − x)bi01
(2.144)
Base Case: Assume n = N . For this immediate-reward problem, the expected reward
functions simplify to the following:
JN1 (x1 , x2 )
JN2 (x1 , x2 )
=
=
k−1 X
i=0
k−1
X
x1 max(ai11 , τ2i (x2 ))
x2 max(bi11 , τ1i (x1 ))
+ (1 −
x1 ) max(ai01 , τ2i (x2 ))
+ (1 −
x2 ) max(bi01 , τ1i (x1 ))
(2.145)
(2.146)
i=0
Since we have assumed that bi11 ≥ ai11 , the following inequalities hold:
bi1,1 ≥ ai1,1 ≥ τ1i (x1 )
(2.147)
bi0,1 ≤ ai0,1 ≤ τ1i (x1 )
Consequently, we can rewrite (2.146) as
Jn2 (x1 , x2 )
=
=
k−1
X
i=0
k−1
X
x2 bi11 + (1 − x2 )τ1i (x1 )
x2 bi11 + (1 − x2 )x1 ai11 + (1 − x2 )(1 − x1 )ai01
i=0
We consider two separate cases depending on if x2 ≥ π2 or x2 < π2 ..
Case 1: x2 ≥ π2 . Equation (2.145) simplifies to
81
(2.148)
JN1 (x1 , x2 )
k−1 X
i
i
i
=
x1 max(a11 , τ2 (x2 )) + (1 − x1 )τ2 (x2 )
=
i=0
k−1
X
x1 max(ai11 , τ2i (x2 ))
+ (1 −
x1 )x2 bi11
+ (1 − x1 )(1 −
x2 )bi01
i=0
(2.149)
k−1
X
i
i
i
i
i
=
x1 max(a11 , τ2 (x2 )) + x2 b11 − x1 x2 b11 + (1 − x1 )(1 − x2 )b01
i=0
(2.150)
=
JN2 (x1 , x2 )
k−1 X
+
x1 max(ai11 , τ2i (x2 )) − x1 x2 bi11 + (1 − x1 )(1 − x2 )bi01
i=0
− (1 −
≤
JN2 (x1 , x2 )
x2 )x1 ai11
− (1 − x2 )(1 −
x1 )ai01
k−1 X
i
i
i
i
+
x1 max(a11 , τ2 (x2 )) − x1 x2 b11 − x1 (1 − x2 )a11
(2.151)
(2.152)
i=0
=
JN2 (x1 , x2 )
+
k−1
X
max x1 ai11 − x1 x2 bi11 − (1 − x2 )x1 ai11 ,
i=0
x1 τ2i (x2 )
=
JN2 (x1 , x2 )
+
−
k−1
X
x1 x2 bi11
− (1 −
x2 )x1 ai11
(2.153)
i
i
i
i
max x1 x2 (a11 − b11 ), x1 (1 − x2 )(b01 − a11 )
(2.154)
i=0
≤ JN2 (x1 , x2 )
(2.155)
In the above, (2.151) follows from subtracting (2.148), (2.152) follows from bi01 ≤ ai01 ,
and (2.155) follow from bi11 ≥ ai11 .
Case 2: x2 ≤ π2 . Equation (2.145) simplifies to
JN1 (x1 , x2 )
=
k−1 X
x1 ai11
+ (1 −
x1 ) max(ai01 , τ2i (x2 ))
i=0
82
k−1 X
i
i
i
i
i
=
x1 a11 + (1 − x1 ) max(a01 , τ2 (x2 )) + (1 − x2 )x1 a11 − (1 − x2 )x1 a11
i=0
(2.156)
k−1 X
i
i
i
i
=
x1 x2 a11 + (1 − x1 ) max(a01 , τ2 (x2 )) + (1 − x2 )x1 a11
(2.157)
i=0
=
JN2 (x1 , x2 )
k−1 X
+
x1 ai11 x2 + (1 − x1 ) max(ai01 , τ2i (x2 ))
i=0
−
=
x2 bi11
JN2 (x1 , x2 )
+
− (1 − x2 )(1 −
k−1
X
x1 )pi01
(2.158)
max x2 (x1 ai11 + (1 − x1 )ai01 ) − x2 bi11 ,
i=0
x1 x2 (ai11
−
bi11 )
+ (1 − x1 )(1 −
x2 )(bi01
≤ JN2 (x1 , x2 )
−
ai01 )
(2.159)
(2.160)
Where (2.158) results from applying (2.148), and (2.160) comes from the assumption that ai11 ≤ bi11 .
Inductive Step: Assume that Jl1 (x1 , x2 ) ≤ Jl2 (x1 , x2 ), for all n+1 ≤ l ≤ N , we now
prove that Jn1 (x1 , x2 ) ≤ Jn2 (x1 , x2 ). Therefore, the optimal cost to go Jn+1 (x1 , x2 ) =
2
(x1 , x2 ). By combining expressions (2.145) and (2.146) with the induction hyJn+1
pothesis, it follows that
k−1 X
i
i
i
i
x1 max(a11 , τ2 (x2 )) + (1 − x1 ) max(a01 , τ2 (x2 ))
i=0
k−1 X
i
i
i
i
≤
x2 max(b11 , τ1 (x1 )) + (1 − x2 ) max(b01 , τ1 (x1 )) .
i=0
(2.161)
To conclude the proof:
2
2
(ak11 , τ2k (x2 )) + (1 − x1 )Jn+1
(ak01 , τ2k (x2 ))
x1 Jn+1
2
= Jn+1
(τ1k (x1 ), τ2k (x2 ))
83
(2.162)
2
2
= x2 Jn+1
(τ1k (x1 ), ak11 ) + (1 − x2 )Jn+1
(τ1k (x1 ), ak01 )
(2.163)
Where the above comes from the affinity of the function Jn (x1 , x2 ), shown in
(2.132)-(2.132). Thus, combining (2.161) with (2.163) proves the theorem.
2.7.4
Proof of Lemmas 2 and 3
Lemma 8. Let g(x, y) be any function satisfying g(x, y) = ax + by + cxy + d for some
constants a, b, c, d. Then,
g(x, y) − g(y, x) = (x − y)(g(1, 0) − g(0, 1))
(2.164)
Proof.
g(x, y) − g(y, x) = ax + by + cxy + d − ay − bx − cyx − d
= (x − y)(a − b)
= (x − y)(g(1, 0) − g(0, 1))
Lemma 2: If x1 ≥ x2 ≥ x3 , then for all 0 ≤ n ≤ N ,
Wn (x1 , x2 , x3 ) ≥ Wn (x2 , x1 , x3 )
Proof of Lemma 2. The proof follows by reverse induction on the probing index n.
For time n = N ,
WN (x1 , x2 , x3 ) − WN (x2 , x1 , x3 ) = f k (x1 , x2 ) − f k (x2 , x1 ) = 0
(2.165)
The above equality follows from Lemma 1. Assume the inductive hypothesis holds
84
for n + 1:
Wn (x1 , x2 , x3 ) − Wn (x2 , x1 , x3 )
= (x1 − x2 )(Wn (1, 0, x3 ) − Wn (0, 1, x3 ))
(2.166)
= (x1 − x2 )(f k (1, 0) + Wn+1 τ k (1), τ k (x3 ), τ k (0)
− f k (0, 1) − Wn+1 τ k (1), τ k (0), τ k (x3 )
(2.167)
= (x1 − x2 )(Wn+1 τ k (1), τ k (x3 ), τ k (0) − Wn+1 τ k (1), τ k (0), τ k (x3 )
(2.168)
≥ (x1 − x2 )(Wn+1 τ k (1), τ k (0), τ k (x3 ) − Wn+1 τ k (1), τ k (0), τ k (x3 ) = 0
(2.169)
The inequality in (2.169) holds by the induction hypothesis of Lemma 3.
Lemma 3: If x1 ≥ x2 ≥ x3 , then for all 0 ≤ n ≤ N ,
Wn (x1 , x2 , x3 ) ≥ Wn (x1 , x3 , x2 )
Proof of Lemma 3. The proof follows by reverse induction on the probing index n.
For time n = N ,
WN (x1 , x2 , x3 ) − WN (x1 , x3 , x2 ) = f k (x1 , x2 ) − f k (x1 , x3 )
= (x2 − x3 )
k−1
X
(2.170)
pi11 − τ i (x1 )
(2.171)
i=0
= (x2 − x3 )(1 − x1 )
k−1
X
(pi11 − pi01 ) ≥ 0
(2.172)
i=0
where the inequality follows from the positive memory assumption on the channel.
Now we assume the inductive hypothesis for Lemmas 2 and 3 hold for times at and
after n.
Wn (x1 , x2 , x3 ) − Wn (x1 , x3 , x2 ) = (x2 − x3 ) Wn (x1 , 1, 0) − Wn (x1 , 0, 1)
= (x2 − x3 )(f k (x1 , 1) + Wn+1 τ k (1), τ k (x1 ), τ k (0)
− f k (x1 , 0) − Wn+1 τ k (x1 ), τ k (1), τ k (0)
85
(2.173)
(2.174)
k
k
k
k
k
k
≥ (x2 − x3 ) Wn+1 τ (1), τ (x1 ), τ (0) − Wn+1 τ (x1 ), τ (1), τ (0)
(2.175)
k
k
k
k
k
k
≥ (x2 − x3 ) Wn+1 τ (1), τ (x1 ), τ (0) − Wn+1 τ (1), τ (x1 ), τ (0) = 0
(2.176)
The inequality in (2.175) follows from (2.170) - (2.172). The inequality in (2.176)
follows from the inductive hypothesis of Lemma 2.
86
Chapter
3
Opportunistic Scheduling with Limited
Channel State Information: A Rate
Distortion Approach
Consider a transmitter and a receiver connected by two independent channels. The
state of each channel is either ON or OFF, where transmissions over an ON channel
result in a unit throughput, and transmissions over an OFF channel fail. Channels
evolve over time according to a Markov process. At the beginning of each time slot,
the receiver measures the channel states in the current slot, and transmits channel
state information (CSI) to the transmitter. Based on the CSI sent by the receiver,
the transmitter chooses over which of the channels to transmit.
In a system in which an ON channel and OFF channel are equally likely to occur,
the transmitter can achieve an expected per-slot throughput of
a per-slot throughput of
3
4
1
2
without CSI, and
if the transmitter has full CSI before making scheduling
decisions. However, the transmitter does not need to maintain complete knowledge of
the channel state in order to achieve high throughput; it is sufficient to only maintain
knowledge of which channel has the best state. Furthermore, the memory in the
system can be used to further reduce the required CSI. We are interested in the
minimum rate that CSI must be sent to the transmitter in order to guarantee a lower
bound on expected throughput. This quantity represents a fundamental limit on the
87
overhead information required in this setting.
The above minimization can be formulated as a rate distortion optimization with
an appropriately designed distortion metric. In particular, the rate distortion function provides a lower bound on the rate at which the transmitter must obtain CSI in
order to satisfy an average throughput constraint. The opportunistic communication
framework, in contrast to traditional rate distortion, requires that the CSI sequence
be causally encoded, as the receiver observes the channel states in real time. Consequently, restricting the rate distortion problem to causal encodings provides a tighter
lower bound on the required CSI that must be provided to the transmitter.
Opportunistic scheduling is one of many network control schemes that require
network state information (NSI) in order to make control decisions. The performance
of these schemes is directly influenced by the availability and accuracy of this information. However, the overhead required to convey this information is often ignored,
leading to inaccurate performance guarantees. If the network state changes rapidly,
there are more possibilities to take advantage of an opportunistic performance gain,
albeit at the cost of additional overhead. For large networks, this overhead becomes
prohibitive. Therefore, it is increasingly important to quantify the minimum amount
of information that must be conveyed in order to implement efficient network control, as well as investigate schemes for controlling a network with limited feedback
information.
This chapter presents a novel rate distortion formulation to quantify the fundamental limit on the rate of overhead required for opportunistic scheduling.1 . We
design a new distortion metric for this setting that captures the impact of the availability of CSI on network performance, and incorporate a causality constraint to the
rate distortion formulation to reflect practical constraints of a real-time communication system. We analytically compute a closed-form expression for the causal rate
distortion lower bound for a two-channel system. Additionally, we propose a practical
encoding algorithm to achieve the required throughput with limited overhead. Moreover, we show that for opportunistic scheduling, there is a fundamental gap between
1
An earlier version of this work appeared in [38].
88
the mutual information and entropy-rate-based rate distortion functions, and discuss
scenarios under which this gap vanishes.
The remainder of this chapter is outlined as follows. The problem is formally
presented in Section 3.1. Section 3.2 contains our main result, the formulation and
solution to the causal information rate distortion problem. In Section 3.3, we present
an algorithm for encoding the channel state information and quantify its performance.
Lastly, in Section 3.4, we analyze the gap between the operational and information
rate distortion functions.
3.1
System Model
Consider a transmitter and a receiver, connected through M independent channels,
as shown in Figure 3-1. Assume a time-slotted system, where at time-slot t, each
channel has a time-varying channel state Si (t) ∈ {OFF, ON}, independent from all
other channels. The notation Si (t) ∈ {0, 1} is used interchangeably.
S1
S2
TX
RX
SM
Figure 3-1: System Model: A transmitter and receiver connected by M independent channels.
Let X(t) = Xt = {S1 (t), S2 (t), . . . , SM (t)} represent the system state at time slot
t. At each time slot, the transmitter chooses a channel over which to transmit, with
the goal of opportunistically transmitting over an ON channel. Channel states evolve
over time according to a Markov process described by the chain in Figure 3-2, with
transition probabilities p and q satisfying 1−p−q ≥ 0, corresponding to channels with
“positive memory.” A channel with positive memory is more likely to be ON if it was
ON at the previous time, than if it was OFF. Let π be the steady state probability
of the channel being in the ON state. For this channel state model, π =
89
p
.
p+q
p
1−p
OFF
1−q
ON
q
Figure 3-2: Markov chain describing the channel state evolution of each independent channel.
X(t)
TX
RX
Z(t)
Figure 3-3: Information structure of an opportunistic communication system. The receiver
measures the channel state X(t), encodes this into a sequence Z(t), and transmits this
sequence to the transmitter.
The transmitter does not observe the state of the system. Instead, the receiver
causally encodes the sequence of channel states X1n into the sequence Z1n and sends
the encoded sequence to the transmitter, as illustrated in Figure 3-3, where X1n is
used to denote the vector of random variables [X(1), . . . , X(n)]. The encoding Z(t) =
Zt ∈ {1, . . . , M } represents the index of the channel over which to transmit. Since
the throughput-optimal transmission decision is to transmit over the channel with
the best state, it is sufficient for the transmitter to restrict its knowledge to the index
of the channel with the best state at each time. We assume that the feedback of
the index Z(t) happens instantaneously (with zero delay), and the objective is to
minimize the information rate over the feedback link.
The expected throughput earned in slot t is E[thpt(x(t), z(t))] = SZ(t) (t), since
the transmitter uses channel i = Z(t), and receives a throughput of 1 if that channel
is ON, and 0 otherwise. Clearly, a higher throughput is attainable with more accurate
CSI. We define a distortion measure between a sequence of channel states xn1 and an
90
encoded sequence z1n to measure the quality of the encoding. The average distortion
between the sequences xn1 and z1n is defined in terms of the per-letter distortion,
n
d(xn1 , z1n ) =
1X
d(xi , zi ),
n i=1
(3.1)
where d(xi , zi ) is the per-letter distortion between the ith source symbol and the ith
encoded symbol at the transmitter. Traditionally, the distortion metric measures the
distance between the two sequences. Common examples are a Hamming distortion
metric, which measures the probability of error between two sequences, and the meansquared error distortion metric. In the opportunistic communication setting, these
traditional distortion metrics are inappropriate, since the transmitter does not need
to know the channel state of each of the channels, but rather which channel yields
the highest transmission rate. Thus, for the opportunistic communication framework,
the per-letter distortion is defined as
d(xi , zi ) , 1 − E[thpt(t)] = 1 − SZ(t) (t),
(3.2)
where SZ(t) is the state of the channel indexed by Z(t). This definition quantifies the
loss in throughput by transmitting over channel Zi . Consequently, an upper bound
on expected distortion translates to a lower bound on expected throughput.
3.1.1
Problem Formulation
The goal in this chapter is to determine the minimum rate that CSI must be conveyed
to the transmitter to achieve a lower bound on expected throughput. In this setting,
CSI must be conveyed to the transmitter casually, in other words, the ith encoding
can only depend on the channel state at time i, and previous channel states and
encodings. Let Qc (D) be the family of causal encodings q(z1n |xn1 ) satisfying
E[d(xn1 , z1n )] =
XX
xn
1
p(xn1 )q(z1n |xn1 )d(xn1 , z1n ) ≤ D,
z1n
91
(3.3)
where p(xn1 ) is the PDF of the source, and the causality constraint:
q(z1i |xn1 ) = q(z1i |y1n )
∀xn1 , y1n s.t. xi1 = y1i .
(3.4)
Mathematically, the minimum rate that CSI must be transmitted is given by
1
H(Z1n ),
n→∞ q∈Qc (D) n
RcN G (D) = lim
where
1
H(Z1n )
n
inf
(3.5)
is the entropy rate of the encoded sequence in bits. Equation (3.5)
is the causal rate distortion function, as defined by Neuhoff and Gilbert [51], and
is denoted using the superscript NG. Here, the rate distortion function is defined as
a minimization of entropy rate rather than a minimization of mutual information,
as in [24, 60, 63], which is discussed in Section 3.2. The decision to formulate this
problem as a minimization of entropy rate is based on the intuition that the entropy
rate captures the average number of bits per channel use required to convey channel
state information.
3.1.2
Previous Work
As mentioned in Section 1.2.2, several works have used rate distortion-based approaches to characterize limits on required control information. While the traditional
rate-distortion problem has been well studied [6], there have been several works extending these results to Markov Sources [7, 25]. In [34], the authors develop bounds
on the rate distortion function, by assuming every k th source symbol is transmitted
noiselessly to the receiver, and using the fact that given those symbols, the rest of
the source symbols can be viewed as independent blocks. The lower bounds presented in [34] can be arbitrarily tight, but at the cost of exponentially increasing
computational complexity.
Additionally, researchers have considered the causal source coding problem due
to its application to real-time processing. One of the first works in this field was [51],
in which Neuhoff and Gilbert show that the best causal encoding of a memoryless
92
source is a memoryless coding, or a time sharing between two memoryless codes.
However, this result pertains to sources without memory. Neuhoff and Gilbert focus
on the minimization of entropy rate, as in (3.5). The work in [68] studied the optimal
finite-horizon sequential quantization problem, and showed that the optimal encoder
for a k th -order Markov source depends on the last k source symbols and the present
state of the decoder’s memory (i.e. the history of decoded symbols). A similar result
was shown in [66] for an infinite horizon sequential quantization problem.
Later, a causal (sequential) rate distortion theory was introduced in [9] and [63]
for general stationary sources. They show that the sequential rate distortion function
lower bounds the entropy rate of a causally encoded sequence, but this inequality is
strict in general. Despite this, operational significance for the causal rate distortion
function is developed in [63]. Lastly, [60] studies the causal rate distortion function as
a minimization of directed mutual information, and computes the form of the optimal
causal stochastic kernels.
3.2
Rate Distortion Lower Bound
To begin, we review the traditional rate distortion problem. Then, we extend this
formulation by defining the causal information rate distortion function, which is a
minimization of mutual information, and is known to lower bound RcN G (D) [9]. The
causal information rate distortion function provides a lower bound on the required
rate at which CSI must be conveyed to the transmitter to meet the throughput
requirement.
3.2.1
Traditional Rate Distortion
Consider the well known rate distortion problem, in which the goal is to find the
minimum number of bits per source symbol necessary to encode a sequence of source
symbols while meeting a fidelity constraint. Consider a discrete memoryless source
{Xi }∞
i=1 , where each Xi is an i.i.d. random variable taking values in the set X ,
according to distribution pX (x). This source sequence is encoded into a sequence
93
{Zi }∞
i=1 , with Zi taking values in Z. The distortion between a block of source symbols
and encoded symbols is defined as
N
d(xN
1 , z1 )
N
1 X
=
d(xi , zi ),
N i=1
(3.6)
where d(xi , zi ) is the per-letter distortion between the source symbol xi and encoded
symbol zi . Define Q(D) to be the family of conditional probability distributions
q(z|x) satisfying
E[d(x, z)] =
XX
pX (x)q(z|x)d(x, z) ≤ D.
(3.7)
x∈X z∈Z
Shannon’s rate-distortion theory [15] states that the minimum rate R at which the
source can be encoded with average distortion less than D is given by the information
rate distortion function R(D), where
R(D) ,
min
I(X; Z),
(3.8)
q(z|x)∈Q(D)
and I(·; ·) represents mutual information. The rate-distortion function satisfies
1
H(Z1n ) = R(D),
q(z|x)∈Q(D) n
min
(3.9)
implying that the encoded sequence can be compressed to an average of R(D) bits
per symbol. In other words, the problem of minimizing entropy rate can be solved as
a minimization of mutual information, which is known to be an easier problem, since
the formulation in (3.8) is convex.
3.2.2
Causal Rate Distortion for Opportunistic Scheduling
Now, the previous formulation is extended to the causal setting for the opportunistic
communication problem described in Section 3.1. As discussed above, the information
rate distortion function is a minimization of mutual information over all stochastic
94
kernels satisfying a distortion constraint. For opportunistic scheduling, this minimization is further constrained to include only causal kernels. Let Qc (D) be the set
of all stochastic kernels q(z1n |xn1 ) satisfying the expected distortion constraint in (3.3)
and the causality constraint in (3.4). The causal information rate distortion function
is defined as
Rc (D) , lim
inf
n→∞ q(z1n |xn
1 )∈Qc (D)
1
I(X1n ; Z1n ).
n
(3.10)
This definition is the same as that found in [24, 63], as well as in [60] where it is
referred to as a non-anticipatory rate distortion function. As mentioned previously,
the function Rc (D) is a lower bound on the Neuhoff-Gilbert rate distortion function
RcN G (D) in (3.5), and hence a lower bound on the rate of CSI that needs to be conveyed
to the transmitter to ensure expected per-slot throughput is greater than 1 − D. In
the traditional (non-causal) rate distortion framework, this bound is tight; however,
in the causal setting, the minimization of mutual information is potentially very
different than the minimization of entropy rate. This is explored further in Section
3.4. Note that for memoryless sources, Rc (D) = R(D), where R(D) is the traditional
rate distortion function; however, for most memoryless sources, R(D) < RcN G (D).
The optimization problem in (3.10) is solved using a geometric programming dual
as in [13]. The following result gives the structure of the optimal stochastic kernel.
Note that this result is also obtained in [60].
Theorem 14. The optimal kernel q(z1n |xn1 ) satisfies
Q(zi |z1i−1 ) exp(−λd(xi , zi ))
i−1
zi Q(zi |z1 ) exp − λd(xi , zi )
q(zi |z1i−1 , xi1 ) = P
(3.11)
where for all z1i , Q(zi |z1i−1 ) and λ satisfy
1=
X
xn
1
P
P (xn1 ) exp − ni=1 λd(xi , zi )
Qn P
i−1
i=1
zi Q(zi |z1 ) exp − λd(xi , zi )
(3.12)
The proof of Theorem 14 is given in the Appendix. Equation (3.12) holds for all
encodings z1n , and gives a system of equations from which one can solve for Q(zi |z1i−1 ).
95
This holds in general for any number of Markovian channels, and can be numerically
solved to determine Rc (D). Observe in (3.11) that q(zi |z1i−1 , xi1 ) = q(zi |z1i−1 , xi ). In
other words, the solution to the rate distortion optimization is a distribution which
generates Zi depending on the source sequence only through the current channel state.
This result follows from the Markov property of the channel state sequence.
3.2.3
Analytical Solution for Two-Channel System
While Theorem 14 provides a framework to numerically calculate the causal information rate distortion function, for simple problem settings, Rc (D) can be analytically
characterized. Consider the system in Figure 3-1 with two channels (M = 2), where
each channel’s evolution over time follow the Markov chain in Figure 3-2. Assume
the Markov chain is symmetric, i.e. p = q, although a similar analysis holds without
this assumption. In this setting, we can obtain a closed-form expression for the causal
information rate distortion function.
Theorem 15. For the aforementioned system, the causal information rate distortion
function is given by
Rc (D) = 21 Hb 2p − 4pD + 2D −
for all D satisfying
1
4
1
2
− 12 Hb 2D −
1
2
(3.13)
≤ D ≤ 12 .
The proof of Theorem 15 is given in the Appendix, and follows from evaluating
(3.11) and (3.12) for a two channel system, and showing the stationarity of the optimal
kernel using the following Lemma, also proved in the Appendix.
Lemma 9. The optimal values of Q(zi |z1i−1 ) = Q(zi |zi−1 ) for all 1 ≤ i ≤ n. Furthermore, for all i,
Q(zi |zi−1 ) =


1 − p
zi = zi−1

p
zi 6= zi−1
(3.14)
This lemma shows that the optimal distributions q(zn |zn−1 , xn ) and Q(zn |zn−1 )
are stationary, and the rate distortion problem can be solved as a minimization over
96
Figure 3-4: Causal information rate distortion function for different state transition probabilities p for a two channel opportunistic scheduling system.
a single letter. This result only holds for the two-channel opportunistic scheduling
problem.
The information rate distortion function in (3.60) is a lower bound on the rate
that information needs to be conveyed to the transmitter. A distortion Dmin =
represents a lossless encoding, since for a fraction
1
4
1
4
of the time slots, both channels
are OFF, and no throughput can be obtained. Additionally, Dmax =
1
2
corresponds
to an oblivious encoder, as transmitting over an arbitrary channel requires no CSI,
and achieves distortion equal to 12 . The function Rc (D) is plotted in Figure 3-4 as a
function of D. As the memory in the channel state process increases (state transition
probability p decreases), the required overhead rate decreases as the transmitter needs
less information to accurately estimate the state of the channels.
3.3
Heuristic Upper Bound
The causal information rate distortion function Rc (D) computed in the previous
section provides a lower bound on the Neuhoff-Gilbert rate distortion function. To
quantify the tightness of the bound, we propose an algorithmic upper bound to RcN G
in (3.5). For simplicity, assume that p = q, and that M = 2, i.e. the transmitter has
97
1 1 2 2 2 2 2 2 1 1 1 1 1 1 1 ?
t − 15
t − 10
t−5
t
K(t)
Figure 3-5: Definition of K, the time since the last change in the sequence Z(t), with respect
to the values of Z(t) up to time t.
two symmetric channels over which to transmit. Therefore, X(t) ∈ {00, 01, 10, 11}.
Observe that when X(t) = 11, no distortion is accumulated regardless of the encoding
Z(t), and a unit distortion is always accumulated when X(t) = 00. The minimum
possible average distortion is Dmin =
fraction
3.3.1
1
4
1
,
4
since the state of the system is 00 for a
of the time.
Minimum Distortion Encoding Algorithm
To begin, we present a casual encoding of the state sequence that achieves minimum
distortion. Recall that a causal encoder f (·) satisfies Z(t) = f (X1t , Z1t−1 ). Consider
the following encoding policy:




Z(t − 1) if X(t) = 00 or X(t) = 11



Z(t) = 1 if X(t) = 10





2 if X(t) = 01
(3.15)
Note that Z(t) is a function of Z(t−1) and X(t), and is therefore a causal encoding as
defined in (3.4). The above encoding achieves a minimum expected distortion equal
to 14 . Note that the transmitter does not learn the complete channel state through
this encoding, but conveying full CSI requires additional rate at no further reduction
to distortion. Let K(i) be a random variable denoting the number of time slots since
the last change in the sequence Z(i), i.e.,
K(i) = min{j < i|Z(i − j) 6= Z(i − j − 1)}.
j
(3.16)
Since Z(i) ∈ {1, 2}, K(i) is interpreted as the length of the current run of ones
98
or twos in the encoded sequence, as illustrated in Figure 3-5. Thus, at each time, the
transmitter is able to infer the state of the system up to K(i) slots ago. Since the
channel state is Markovian, the entropy rate of the sequence Z1∞ is expressed as
n
1
1X
lim H(Z1n ) = lim
H(Z(i)|Z1i−1 )
n→∞ n
n→∞ n
i=1
(3.17)
= H(Z(i)|Z(i − 1), K(i))
(3.18)
=
∞
X
P(K = k)Hb (P(Z(i) 6= Z(i − 1)|K = k))
(3.19)
k=1
where Hb (·) is the binary entropy function. Note by definition, Z(i − 1) = Z(i − K(i))
in (3.18). Equation (3.19) can be computed numerically in terms of the transition
probabilities of the Markov chain in Figure 3-2.
3.3.2
Threshold-based Encoding Algorithm
In order to further reduce the rate of the encoded sequence from that of the encoder in
(3.15), a higher expected distortion is required. We now introduce a new algorithm by
introducing a parameter T , and modifying the encoding algorithm in (3.15) as follows:
If K(i) ≤ T , then Z(i) = Z(i − 1), and if K(i) > T , then Z(i) is assigned according
to (3.15). As a result, for the first T slots after the Z(i) sequence changes value,
the transmitter can determine the next element of the sequence deterministically,
and hence the sequence is encoded with zero rate. After T slots, the entropy in the
Z(i) process is similar to that of the original encoding algorithm. As expected, this
reduction in entropy rate comes at an increase in distortion. In the first T slots after
a change to Z(i) = 1, every visit to state X(i) = 01 or X(i) = 00 incurs a unit
distortion. Therefore, the accumulated distortion is equal to the number of visits to
those states in an interval of T slots.
Clearly, as the parameter T increases, the entropy rate decreases, and the expected distortion increases. Consequently, T parameterizes the rate-distortion curve;
however, due to the integer restriction, only a countable number of rate-distortion
pairs are achievable by varying T . To generate the full R-D curve, time sharing is
99
used to interpolate between the points parameterized by T . An example curve is
shown in Figure 3-6, for p = q = 0.2. Note that as T increases, the corresponding
points on the R(D) curve become more dense. Furthermore, for the region of R(D)
parameterized by large T , the function R(D) is linear. The slope of this linear region
is characterized by the following result.
Proposition 4. Let R(T ) and D(T ) denote the rate and expected distortion as functions of the parameter T respectively. For large T , the achievable R(D) curve for the
above encoding algorithm, denoted by the points (D(T ), R(T )) has slope
−H(M )
R(T + 1) − R(T )
=
,
T →∞ D(T + 1) − D(T )
c + 14 E[M ]
lim
(3.20)
where M is a random variable denoting the expected number of slots after the initial
T slots until the Zi sequence changes value, and c is a constant given by
T X
c=
E[1(Xi = 00 or Xi = 01)]
i=1
− E[1(Xi = 00 or Xi = 01)|X0 = 10] .
(3.21)
The proof of Proposition 4 is given in the Appendix. The constant in (3.21) represents the difference in expected accumulated distortion over an interval of T slots
between the state processes beginning in steady state and the state process beginning
at state X0 = 10. Proposition 4 shows that the slope of R(D) is independent of T for
T sufficiently large. As T grows large, the value of Zi changes rarely, and therefore
distortion will be accumulated in half of the states. Hence, at zero rate, a distortion
of
1
2
is attainable. Figure 3-6 plots the algorithmic upper bound as a function of
distortion by varying the parameter T from 1 to 20 and time sharing between these
points. Data points are computed using Monte Carlo simulation. Additionally, Figure 3-6 shows the causal information rate distortion function for the same channel
transition probabilities for comparison.
100
Lower and Upper Bounds on Required Overhead Rate
0.5
Rc(D)
0.45
Encoding Upper Bound
0.4
0.35
Rate
0.3
0.25
0.2
0.15
0.1
0.05
0
0.25
0.3
0.35
D
0.4
0.45
0.5
Figure 3-6: The causal information rate distortion function Rc (D) (Section 3.2) and the
upper bound to the rate distortion function (Section 3.3), computed using Monte Carlo
Simulation. Transition probabilities satisfy p = q = 0.2.
3.4
Causal Rate Distortion Gap
Figure 3-6 shows a gap between the causal information rate distortion function, and
the heuristic upper bound to the Neuhoff-Gilbert rate distortion function computed in
Section 3.3. In this section, we prove that for a class of distortion metrics including the
throughput metric in (3.2), there exists a gap between the information and NeuhoffGilbert causal rate distortion functions, even at D = Dmin .
To illustrate this concept, consider a discrete memoryless source {Xi }, drawing
i.i.d. symbols from the alphabet {0, 1, 2}, and an encoded sequence {Zi } drawn from
{0, 1, 2}. Consider the following distortion metrics: d1 (x, z) = 1z6=x and d2 (x, z) =
1z=x , where 1 is the indicator function. The first metric d1 (x, z) is a simple Hamming
distortion measure, used to minimize probability of error, and the second metric is an
inverse Hamming measure. Note that for the second distortion metric, there exist two
distortion-free encodings for each source symbol. The causal rate distortion functions
Rc (D) for d1 (x, z) and d2 (x, z) are computed using the results of Theorem 14, and
101
the fact that q(zi |z1i−1 , xi ) = q(zi |xi ) due to the memoryless property of the source.
Rc,1 (D) = −Hb (D) − D log 32 − (1 − D) log 13
0 ≤ D ≤ 32 ,
(3.22)
Rc,2 (D) = −Hb (D) − D log 13 − (1 − D) log 23
0 ≤ D ≤ 31 .
(3.23)
Additionally, Neuhoff and Gilbert [51] show that for a memoryless source, RcN G equals
the lower convex envelope of all memoryless encoders for this source. The relevant
memoryless encoders that lie on the convex envelope are the minimum-rate, zero
distortion encoder, and the oblivious encoder, which always outputs the same index,
requiring zero rate and accumulating high distortion. Thus, the entropy-rate based
rate distortion functions are given by
NG
Rc,1
(D) = (1 − 32 D) log 3
NG
Rc,2
(D) = (1 − 3D)Hb ( 31 )
0≤D≤
2
3
0 ≤ D ≤ 13 .
(3.24)
(3.25)
The information and Neuhoff-Gilbert rate distortion functions for the two metrics are
plotted in Figure 3-7. Note that for both distortion metrics, the causal rate distortion
function is not operationally achievable. Furthermore, in the lossless encoding case
(D = Dmin ), there is a gap in between the two rate distortion functions when using
the second distortion metric, but not the hamming distortion metric. This gap arises
when for a state x, there exist multiple encodings z that can be used with no distortion
penalty. This observation is formalized in the following result.
Theorem 16. Let {Xi } represent an i.i.d. discrete memoryless source from alphabet
X , encoded into a sequence {Zi } taken from alphabet Z, subject to a per-letter distortion metric d(xi , zi ). Furthermore, suppose there exists x1 , x2 , y ∈ X and z1 , z2 ∈ Z,
such that z1 6= z2 and
a) P(x1 ) > 0, P(x2 ) > 0, P(y) > 0,
102
(a) Distortion d1 (x, z).
(b) Distortion d2 (x, z).
Figure 3-7: Rate distortion functions for example systems.
b) z1 is the minimizer z1 = arg minz d(x1 , z),
c) z2 is the minimizer z2 = arg minz d(x2 , z),
d) d(y, z1 ) = d(y, z2 ) = minz d(y, z).
Then RcN G (Dmin ) > Rc (Dmin ).
Proof. By [51], there exists a deterministic function f : X → Z such that
RcN G (Dmin ) = H(f (X))
E[d(X, f (X))] = Dmin
(3.26)
(3.27)
Define a randomized encoding q(z|x), where z = f (x) for all x 6= y, and the source
symbol y is encoded randomly into z1 or z2 with equal probability. Consequently,
H(Z|X) > 0, and H(Z) > Iq (X; Z) under encoding q(z|x). Note that the new
encoding also satisfies Eq [d(X, Z)] = Dmin . To conclude,
RN G (Dmin ) = H(f (X)) > Iq (X; Z)
≥ R(Dmin ) = Rc (Dmin )
103
(3.28)
Theorem 16 shows that if there exists only one deterministic mapping f : X → Z
resulting in minimum distortion, then there will be no gap between the NeuhoffGilbert rate distortion function and the causal information rate distortion function
at Dmin . However, when there are multiple deterministic mappings that achieve
minimum distortion, a randomized combination of them results in lower mutual information, creating a gap between the minimization of mutual information and the
minimization of entropy rate. Note that the throughput distortion metric in (3.2)
satisfies the conditions of Theorem 16, since any encoding that returns the index of
an ON channel does not incur distortion (results in a successful transmission).
While the above result proves that the causal information rate distortion function
is not tight, it is still possible to provide an operational interpretation to Rc (D) in
(3.60). In [63], the author proves that for a source {Xn,t }, which is Markovian across
the time index t, yet i.i.d. across the spatial index n, there exist blocks of sufficiently
large t and n such that the causal rate distortion function is operationally achievable, i.e. the information and Neuhoff-Gilbert rate distortion functions are equal.
In the opportunistic scheduling setting, this is equivalent to a transmitter sending
N messages to the receiver, where each transmission is assigned a disjoint subset of
the channels over which to transmit. However, this restriction results in a reduced
throughput, as separate transmitters must be restricted from selecting channels belonging to another transmitter’s subset of channels. Relaxing this restriction further
improves throughput, but reintroduces the gap between rate distortion functions.
3.5
Application to Channel Probing
In Chapter 2, we analyzed optimal probing schemes to maximize throughput in an
opportunistic communication system. In this section, channel probing is interpreted
as an encoding of CSI to be sent to the transmitter. We compare the information
overhead under the channel probing framework to the lower bound found in Section
3.2, to evaluate channel probing as a strategy for acquiring CSI.
Consider a system with two channels, where each channel evolves over time ac104
cording to the Markov chain in Figure 2. Assume the transmitter probes one of the
channels every T slots, and uses the CSI gathered from channel probes to opportunistically schedule a transmission. A small probing interval T corresponds to a
high rate of CSI acquisition at the transmitter, but the frequent availability of CSI
leads to a high system throughput. On the other hand, large probing intervals lead to
a lower throughput performance, but CSI updates are sent less frequently. Therefore,
by computing the entropy rate of the information obtained by the probing process,
and the achievable throughput, the probing interval T parameterizes a rate-distortion
curve which can be compared with the rate distortion lower bound in Theorem 15.
In Chapter 2, it is shown that for fixed probing intervals over a two-channel system,
the policy which probes channel 1 at each probe is optimal. As in Section 3.1, let Z(t)
be the index of the channel that the transmitter activates. If a probe does not occur
at time t, then the transmitter uses the same channel as in the previous slot, as no
new CSI has been gathered, so Z(t) = Z(t − 1). On the other hand, if a probe occurs,
Z(t) is computed based on the results of the probe. If channel 1 is ON, then Z(t) = 1,
and if it is OFF, Z(t) = 2. The entropy rate of the sequence Z1∞ is expressed as
n
1
1X
lim H(Z1n ) = lim
H(Z(i)|Z1i−1 )
n→∞ n
n→∞ n
i=1
k T −1
1 XX
= lim
H(Z(ki + j)|Z1ki+j−1 )
k→∞ kT
i=1 j=0
(3.29)
(3.30)
k
1X1
= lim
H(Z(ki)|Z1ki−1 )
k→∞ k
T
i=1
=
1
H(S(T )|S(0))
T
(3.31)
(3.32)
Equation (3.30) follows by breaking the sequence of time slots up into separate
channel probes. Equation (3.31) is a simplification based on the fact that only one
probe occurs each T slots, and this is the only slot in which CSI is conveyed. Lastly,
(3.32) follows since the entropy of Z(t) at each probing instance t is equal to the
entropy of the result of the probe, and by the Markov property of the channel state,
105
depends on the past probes only through the result of the previous probe. Equation
(3.32) is evaluated using the channel statistics as follows.
(1 − π)H S(T )|S(0) = 0 + πH S(T )|S(0) = 1
1
H(S(T )|S(0)) =
T
T
T
(1 − π)Hb (p01 ) + πHb (pT11 )
=
T
(3.33)
(3.34)
where π is the steady state probably of a channel being ON, pks,1 is the k-step transition
probability of the Markov chain, and Hb (·) is the binary entropy function.
Using the throughput distortion constraint in (3.2), the expected distortion is
given by one minus the expected per-slot throughput. In Chapter 2, it is shown that
the average per-slot throughput in this system is given by
E[Thpt] = π +
πpT10
T (p + q)
(3.35)
which implies the expected distortion is given by
E[D] = (1 − π) −
πpT10
T (p + q)
(3.36)
In summary, the probing interval T parameterizes a rate distortion curve (R(T ), D(T ))
given by
(R(T ), D(T )) =
(1 − π)Hb (pT01 ) + πHb (pT11 )
πpT10
, (1 − π) −
T
T (p + q)
(3.37)
In Figure 3-8, this rate distortion curve is plotted along with the causal information
rate distortion lower bound and heuristic upper bound in Figure 3-6, yielding a new
algorithmic upper bound on the required rate that CSI must be conveyed. The
probing policy does not perform as well as the heuristic upper bound in Section 3.3,
which suggests that this channel probing policy is not an efficient method of acquiring
CSI. However, by the analysis in Chapter 2, it is known that the throughput of
probing policies is increased by dynamically optimizing the probing intervals based
106
Lower and Upper Bounds on Required Overhead Rate
0.8
Rc(D)
Heuristic Upper Bound
Probing Upper Bound
0.7
0.6
Rate
0.5
0.4
0.3
0.2
0.1
0
0.25
0.3
0.35
D
0.4
0.45
0.5
Figure 3-8: Causal information rate distortion lower bound, heuristic upper bound, and
probing algorithmic upper bound for a two channel system of p = q = 0.2.
on the result of each channel probe. Using the state action frequency approach
in Chapter 2 to compute the optimal dynamic probing intervals would lead to an
improved performance; however, due to the complexity of dynamically optimizing
the probing intervals, deriving an analytic expression for the rate-distortion tradeoff
is difficult.
3.6
Summary
In this chapter, we considered an opportunistic communication system in which a
transmitter selects one of multiple channels over which to schedule a transmission,
based on partial knowledge of the network state. We characterized a fundamental
limit on the rate that CSI must be conveyed to the transmitter in order to meet a
constraint on expected throughput, by modeling the problem as a causal rate distortion optimization of a Markov source. We introduced a novel distortion metric which
measures the impact on throughput that a particular CSI encoding has.
For the case of a two-channel system, a closed-form expression is derived for the
causal information rate distortion lower bound. Furthermore, an algorithmic upper
107
bound is proposed to compare to the lower bound. The gap between the two bounds
is characterized, and we proved that this gap is inherent in using a causal encoding for
channel state information. Lastly, we characterized the rate-distortion performance
of the probing scheme in Chapter 2 to compare to the rate distortion lower bound.
3.7
3.7.1
Appendix
Proof of Theorem 14
Theorem 14: The optimal kernel q(z1n |xn1 ) satisfies
Q(zi |z1i−1 ) exp(−λd(xi , zi ))
i−1
zi Q(zi |z1 ) exp − λd(xi , zi )
q(zi |z1i−1 , xi1 ) = P
(3.38)
where for all z1i , Q(zi |z1i−1 ) and λ satisfy
1=
X
xn
1
P
P (xn1 ) exp − ni=1 λd(xi , zi )
Qn P
i−1
i=1
zi Q(zi |z1 ) exp − λd(xi , zi )
(3.39)
Proof of Theorem 14. Any stochastic kernel q(z1n |xn1 ) ∈ Qc (D) can be decomposed as
q(z1n |xn1 )
=
n
Y
q(zi |z1i−1 , xi1 )
(3.40)
i=1
using the causality of the distribution. The rate-distortion optimization is given by
Min.
1 XX
p(xn1 )q(z1n |xn1 ) log
n xn z n
1
s.t.
1
n
X
q(z1n |xn1 )
P
n
n n
x̂n p(x̂1 )q(z1 |x̂1 )
(3.41)
1
1 XX
d(xi , zi ) ≤ D
p(xn1 )q(z1n |xn1 )
n xn z n
i=1
1
1
X
q(z1i |xi1 ) = 1
∀i, xi1 , z1i−1
(3.42)
(3.43)
z1i
q(z1i |xi1 ) ≥ 0
∀i, xi1 , z1i
(3.44)
The objective in (3.41) is the definition of mutual information. Equation (3.42) is
108
the per-letter average distortion constraint. Equations (3.43) and (3.44) ensure that
q(z1n |z1n ) is a valid causal probability distribution. To see this, consider the constraints
ordered in time. For i = 1,
X
q(zi |x1 ) = 1;
q(z1 |x1 ) ≥ 0
∀x1
(3.45)
z1
Using an inductive argument, if q(z1i−1 |z1i−1 ) is a valid distribution, then the constraint
1=
X
q(z1i |xi1 ) =
XX
q(zi |z1i−1 , xi1 )q(z1i−1 |x1i−1 )
(3.46)
zi z i−1
1
z1i
=
X
q(zi |z1i−1 , xi )
(3.47)
zi
ensures that q(zi |z1i−1 , xi ) and q(z1i |xi1 ) are valid distributions. Thus,
Qn
i=1
q(zi |z1i−1 , xi )
is a valid distribution.
The Lagrangian for the above optimization is derived by relaxing constraint (3.42)
with dual variable λ and constraints (3.43) with dual variables µi (xi1 , z1i−1 ):
1 XX
q(z1n |xn1 )
n
n n
L(q, λ, µ ) =
p(x1 )q(z1 |x1 ) log P
p(x̂n1 )q(z1n |x̂n1 )
n xn zn
x̂n
1
1
1
XX
n
X
1
n
n n
+λ
p(x1 )q(z1 |x1 )
d(xi , zi ) − D
n xn z n
i=1
i
1
−
1
n XX
X
i=1 xn
1
µi (xi1 , z1i−1 )(q(z1n |xn1 ) − 1)
(3.48)
z1n
Differentiating equation (3.48) with respect to each q(z1n |xn1 ) and equating to zero
yields
1
∂
L(q, λ, µi ) = p(xn1 ) log
n n
∂q(z1 |x1 )
n
q(z1n |xn1 )
P
n
n n
x̂n p(x̂1 )q(z1 |x̂1 )
1
n
n
X
1X
+ p(xn1 )
d(xi , zi ) −
µi (xi1 , z1i−1 ) = 0
n i=1
i=1
109
(3.49)
−nµi (xi1 ,z1i−1 )
p(xn
1)
Let αi (xn1 , z1i−1 ) =
and let Q(z1n ) =
P
xn
1
p(xn1 )q(z1n |xn1 ). Solving (3.49) for
q(z1n |xn1 ) yields.
q(z1n |xn1 )
=
Q(z1n ) exp
n
X
i n i−1
−
(λd(xi , zi ) + α (x1 , z1 ))
(3.50)
i=1
From (3.40),
n
Y
q(zi |z1i−1 , xi1 )
=
Q(z1n ) exp
−
i=1
n
X
(λd(xi , zi ) + α
i
(xn1 , z1i−1 ))
(3.51)
i=1
=
n
Y
Q(zi |z1i−1 ) exp(−λd(xi , zi )) exp(−αi (xn1 , z1i−1 ))
(3.52)
i=1
q(zi |z1i−1 , xi1 ) = Q(zi |z1i−1 ) exp(−λd(xi , zi )) exp(−αi (xn1 , z1i−1 ))
(3.53)
Summing (3.53) over zi yields
1=
X
q(zi |z1i−1 , xi1 ) =
zi
X
Q(zi |z1i−1 ) exp(−λd(xi , zi )) exp(−αi (xn1 , z1i−1 ))
(3.54)
zi
α
i
(xn1 , z1i−1 )
= log
X
Q(zi |z1i−1 ) exp(−λd(xi , zi ))
(3.55)
zi
Plugging in (3.55) to (3.50) gives an equation for the optimal stochastic kernel
q(z1n |xn1 ).
Pn
n
)
exp
−
λd(x
,
z
)
Q(z
i
i
i=1
q(z1n |xn1 ) = Qn P1
i−1
)
exp
− λd(xi , zi )
Q(z
|z
i 1
i=1
zi
(3.56)
Q(zi |z1i−1 ) exp(−λd(xi , zi ))
=P
i−1
Q(z
|z
)
exp
−
λd(x
,
z
)
i
i
i
1
zi
(3.57)
and
q(zi |z1i−1 , xi1 )
To solve for the optimal stochastic kernels, multiply (3.56) by p(xn1 ) and sum over
all values of xn1 .
110
X
P
Q(z1n ) exp − ni=1 λd(xi , zi )
= Qn P
i−1
i=1
zi Q(zi |z1 ) exp − λd(xi , zi )
P
X
P (xn1 ) exp − ni=1 λd(xi , zi )
Qn P
1=
i−1
Q(z
|z
)
exp
−
λd(x
,
z
)
i
i
i
n
1
i=1
zi
x
p(xn1 )q(z1n |xn1 )
xn
1
(3.58)
(3.59)
1
3.7.2
Proof of Theorem 15
Theorem 15: For the aforementioned system, the causal information rate distortion
function is given by
Rc (D) = 12 Hb 2p − 4pD + 2D −
for all D satisfying
1
4
1
2
− 12 Hb 2D −
1
2
(3.60)
≤ D ≤ 12 .
Proof of Theorem 15. From Theorem 14, the optimizing stochastic kernel q(z1n |xn1 )
satisfies (3.11). To begin, consider the conditional distribution of the first symbol in
the encoded sequence, z1 .
Q(z1 )e−λd(x1 ,z1 )
q(z1 |x1 ) =
Q(Z1 = 1)e−λd(x1 ,Z1 =1) + Q(Z1 = 2)e−λd(x1 ,Z1 =2)
∀z1
(3.61)
By multiplying both sides by P (x1 ) and summing over all values of x1 ,
1=
X
x1
P (x1 )e−λd(x1 ,z1 )
Q(Z1 = 1)e−λd(x1 ,Z1 =1) + Q(Z1 = 2)e−λd(x1 ,Z1 =2)
∀z1
(3.62)
The two equations in (3.62) are solved for the two unknowns Q(Z1 = 1) and Q(Z1 =
2), denoted as Q1 and Q2 respectively. Using the fact that d(xi , zi ) = 1 − szi , and
111
P (X1 ) =
1
4
for all X1 .
e−λ
1
e−λ
1
+
+
=
Q1 e−λ + Q2 Q1 + Q2 e−λ
Q1 e−λ + Q2 Q1 + Q2 e−λ
1
Q1 = Q2 =
2
(3.63)
(3.64)
Now, consider the conditional distribution of the first two symbols in the encoded
sequence, z1 and z2 . From (3.60),
P2
2
Q(z
)
exp
−
λd(x
,
z
)
i
i
i=1
q(z12 |x21 ) = Q2 P 1
i−1
Q(z
|z
)
exp
− λd(xi , zi )
i 1
i=1
zi
P
X
P (X12 ) exp − 2i=1 λd(xi , zi )
1=
Q2 P
i−1
i=1
zi Q(zi |z1 ) exp − λd(xi , zi )
x2
(3.65)
(3.66)
1
=
X
X P (X1 |X2 )e−λd(x1 ,z1 )
P (X2 )e−λd(x2 ,z2 )
P
−λd(x2 ,z2 )
−λd(x1 ,z1 )
z2 Q(z2 |z1 )e
z1 Q(z1 )e
x
P
x2
(3.67)
1
Let f (x2 , z1 ) be equal to the last term in (3.11), i.e.
f (x2 , z1 ) =
X P (X1 |X2 )e−λd(x1 ,z1 )
X
2P (X1 |X2 )e−λd(x1 ,z1 )
P
=
−λd(x1 ,z1 )
e−λd(x1 ,Z1 =1) + e−λd(x1 ,Z1 =2)
z1 Q(z1 )e
x
x
1
(3.68)
1
where the last equality follows from (3.64). The four equations in (3.67) can be solved
for the four unknowns Q(z2 |z1 ). First consider Z1 = 1. Let Q(Z2 = i|Z1 = j) = Qi|j
X
x2
1 −λd(x2 ,Z2 =1)
e
f (x2 , z1 = 1)
2
−λd(x
,Z
=1)
2 2
Q1|1 e
+ Q2|1 e−λd(x2 ,Z2 =2)
=
X
x2
1 −λd(x2 ,Z2 =2)
e
f (x2 , z1 = 1)
2
−λd(x
,Z
=1)
2 2
Q1|1 e
+ Q2|1 e−λd(x2 ,Z2 =2)
(3.69)
Since f (x2 , z1 = 1) = f (x2 , z1 = 2) if x2 = 00 or 11, the above equality simplifies as
(e−λ − 1)f (x2 = 01, z1 = 1)
(e−λ − 1)f (x2 = 10, z1 = 1)
=
Q1|1 e−λ + Q2|1
Q1|1 + Q2|1 e−λ
(3.70)
(1 − p)e−λ + p
.
1 −λ
(e + 1)
2
(3.71)
Using (3.68),
f (x2 = 01, z1 = 1) = f (x2 = 10, z1 = 2) =
112
f (x2 = 01, z1 = 2) = f (x2 = 10, z1 = 1) =
pe−λ + (1 − p)
.
1 −λ
(e + 1)
2
(3.72)
Combining (3.71) and (3.72) with (3.70),
f (x2 = 01, z1 = 1)
f (x2 = 10, z1 = 1)
=
−λ
Q1|1 e + Q2|1
Q1|1 + Q2|1 e−λ
(1 − p)(1 − e−2λ )
p(1 − e−2λ )
= Q2|1
Q1|1 1 −λ
1 −λ
(e + 1)
(e + 1)
2
2
pQ1|1 = (1 − p)Q2|1
(3.73)
(3.74)
(3.75)
By plugging (3.75) into (3.67), it follows that Q1|1 = (1 − p) and Q2|1 = p. A
similar analysis for the case where Z1 = 2 yields Q1|2 = p and Q2|2 = (1 − p). The
above analysis can be used to solve for Q(zi |z1i−1 ) for any i. From (3.67), it follows
that
1=
X P (Xi |Xi+1 )e−λd(xi ,zi )
P (Xi+1 )e−λd(xi+1 ,zi+1 )
P
i −λd(xi+1 ,zi+1 )
−λd(xi ,zi )
zi+1 Q(zi+1 |z1 )e
zi Q(zi |zi−1 )e
x
X
P
xi+1
i
X P (X1 |X2 )e−λd(x1 ,z1 )
P
···
−λd(x1 ,z1 )
z1 Q(z1 )e
x
(3.76)
1
i
Define f (xi+1
1 , z1 ) as
i
f (xi+1
1 , z1 ) =
X P (X1 |X2 )e−λd(x1 ,z1 )
X P (Xi |Xi+1 )e−λd(xi ,zi )
P
P
·
·
·
−λd(xi ,zi )
−λd(x1 ,z1 )
Q(z
|z
)e
i
i−1
z
z1 Q(z1 )e
i
x
x
(3.77)
1
i
Lemma 9: The optimal values of Q(zi |z1i−1 ) = Q(zi |zi−1 ) for all 1 ≤ i ≤ n.
Furthermore, for all i,
Q(zi |zi−1 ) =


1 − p
zi = zi−1

p
zi 6= zi−1
(3.78)
This Lemma shows that the optimal distributions q(zn |zn−1 , xn ) and Q(zn |zn−1 )
are stationary, and the rate distortion problem can be solved as a minimization over
113
a single letter. It is proved simultaneously with the following lemma.
i
Lemma 10. The function f (xi+1
1 , z1 ) satisfies
i
f (xi+1
1 , z1 ) = f (xi+1 , zi ) =
X P (Xi |Xi+1 )e−λd(xi ,zi )
P
−λd(xi ,zi )
zi Q(zi )e
x
(3.79)
i
Proof of Lemma 10. This is an inductive proof on the index i, with the base case
(i = 2) provided in (3.71) and (3.72). Now assume
f (xi1 , z1i−1 )
X P (Xi−1 |Xi )e−λd(xi−1 ,zi−1 )
P
= f (xi , zi−1 ) =
−λd(xi−1 ,zi−1 )
zi−1 Q(zi−1 )e
x
(3.80)
i
holds for i, and we will prove it holds for i + 1. Note that from (3.80) and (3.79),
f (xi = 00, zi−1 ) = 1
(3.81)
f (xi = 11, zi−1 ) = 1
(3.82)
Q(Zi = 1|zi−1 )e−λ + Q(Zi = 2|zi−1 )
1/2(e−λ + 1)
Q(Zi = 1|zi−1 ) + Q(Zi = 2|zi−1 )e−λ
f (xi = 10, zi−1 ) =
1/2(e−λ + 1)
f (xi = 01, zi−1 ) =
(3.83)
(3.84)
Therefore, for i + 1,
X P (Xi |Xi+1 )e−λd(xi ,zi )
P
f (xi , zi−1 )
−λd(xi ,zi )
Q(z
|z
)e
i
i−1
z
i
xi
X P (Xi |Xi+1 )e−λd(xi ,zi ) X P (Xi−1 |Xi )e−λd(xi−1 ,zi−1 ) P
P
=
−λd(xi ,zi )
−λd(xi−1 ,zi−1 )
Q(z
|z
)e
i
i−1
z
zi−1 Q(zi−1 )e
i
x
x
i
f (xi+1
1 , z1 ) =
i
(3.85)
(3.86)
i−1
= P (Xi = 00|Xi+1 ) + P (Xi = 11|Xi+1 )
+
P (Xi = 01|Xi+1 )e−λd(01,zi ) f (01, zi−1 ) P (Xi = 10|Xi+1 )e−λd(10,zi ) f (10, zi−1 )
+
Q(Zi = 1|Zi−1 )e−λ + Q(Zi = 2|Zi−1 ) Q(Zi = 1|Zi−1 ) + Q(Zi = 2|Zi−1 )e−λ
(3.87)
= P (Xi = 00|Xi+1 ) + P (Xi = 11|Xi+1 )
+
P (Xi = 01|Xi+1 )e−λd(01,zi ) P (Xi = 10|Xi+1 )e−λd(10,zi )
+
1 −λ
1 −λ
(e + 1)
(e + 1)
2
2
114
(3.88)
=
X P (Xi |Xi+1 )e−λd(xi ,zi )
P
−λd(xi ,zi )
zi Q(zi )e
x
(3.89)
i
Where equation (3.86) follows from the induction hypotheses of Lemmas 10 and 9,
and (3.88) follows from (3.81)-(3.84).
Proof of Lemma 9. The proof follows via induction. The base case, when i = 2, is
proven above. Now, assume (3.78) holds for i; we will prove it holds for i + 1. From
Lemma 10, (3.76) is rewritten as
1=
X
=
X
P
P (Xi+1 )e−λd(xi+1 ,zi+1 )
f (xi+1 , zi )
i −λd(xi+1 ,zi+1 )
zi+1 Q(zi+1 |z1 )e
(3.90)
P
X P (Xi |Xi+1 )e−λd(xi ,zi )
P (Xi+1 )e−λd(xi+1 ,zi+1 )
P
,
i −λd(xi+1 ,zi+1 )
−λd(xi ,zi )
Q(z
)e
i
zi+1 Q(zi+1 |z1 )e
z
i
x
(3.91)
xi+1
xi+1
i
which has the same form as the equation in (3.67), and can be solved using the
same method. Thus, Q(Zi+1 |Z1i ) = (1 − p) if Zi = Zi+1 and Q(Zi+1 |Z1i ) = p if
Zi 6= Zi+1 .
Using Lemma 9, equation (3.56) is equivalent to
q(z1n |xn1 )
=
n
Y
i=1
Q(zi |zi−1 ) exp − λd(xi , zi )
P
Q(z
|z
)
exp
−
λd(x
,
z
)
i
i−1
i
i
zi
(3.92)
and
Q(z
|z
)
exp
−
λd(x
,
z
)
i
i−1
i
i
q(zn |z1n−1 , xn ) = q(zn |xn , zn−1 ) = P
zi Q(zi |zi−1 ) exp − λd(xi , zi )
(3.93)
At the optimal point, the distortion constraint is satisfied with equality.
N
1 X
1 XXX
D=
E[d(xi , zi )] =
P (Xi )q(Zi |Xi )d(Xi , Zi ).
N
N i=1 z x
N
i=1
i
(3.94)
i
By the stationarity of the optimal decision, q(Zi |Xi ) is given by equation (3.11),
q(zi |xi ) =
e−λd(xi ,zi )
e−λd(xi ,Zi =1) + e−λd(xi ,Zi =2)
115
(3.95)
Substituting this in (3.94) yields
e−λ =
D − 41
2(D − 14 )
=
3
−D
1 − 2(D − 41 )
4
(3.96)
The dual variable λ is constrained to be positive, occurring when e−λ ≤ 1. Thus,
the following only holds for
1
4
≤ D ≤ 12 .
Combining (3.96) with (3.93) and (3.94) yields expressions for q(zi |zi−1 , xi ) and
q(zi |xi ) for all zi , xi , zi−1 . Using the stationarity of the solution distribution,
1 XX
p(xn1 )q(z1n |xn1 ) log
R(D) =
n xn z n
1
=
1
XXXX
xi−1 zi−1 xi
q(z1n |xn1 )
P
n n
n
x̂n p(x̂1 )q(z1 |x̂1 )
(3.97)
1
p(xi )q(zi−1 |xi−1 )p(xi−1 |xi )q(zi |zi−1 , xi ) log
zi
q(zi |zi−1 , xi )
Q(zi , zi−1 )
(3.98)
1
1
3
1
=
Hb 2((1 − p)(D − 4 ) + p( 4 − D)) − Hb 2(D − 4 )
2
3.7.3
(3.99)
Proof of Proposition 4
Proposition 4: Let R(T ) and D(T ) denote the rate and expected distortion as functions of the parameter T respectively. For large T , the achievable R(D) curve for the
above encoding algorithm, denoted by the points (D(T ), R(T )) has slope
R(T + 1) − R(T )
−H(M )
=
,
T →∞ D(T + 1) − D(T )
c + 14 E[M ]
lim
(3.100)
where M is a random variable denoting the expected number of slots after the initial
T slots until the Zi sequence changes value, and c is a constant given by
T X
c=
E[1(Xi = 00 or Xi = 01)]
i=1
116
− E[1(Xi = 00 or Xi = 01)|X0 = 10]
(3.101)
Proof of Proposition 4. The rate-distortion curve is piecewise linear, with a slope s(T )
as a function of T .
s(T ) =
R(T + 1) − R(T )
D(T + 1) − D(T )
(3.102)
Each of the quantities in (3.102) can be computed using renewal theory. Define a
renewal to be the time when the sequence of encoded symbols changes value, i.e.
Zi 6= Zi−1 . Let L be a random variable denoting the interval between renewals. Each
renewal interval can be broken into two parts: an interval of length T , corresponding
to the period in which Zi = Zi−1 deterministically, and an interval of length M , which
is a random variable denoting the time after the initial T slots until a renewal occurs.
Thus,
E[L] = T + E[M ].
(3.103)
As T becomes large, the state saturates to the stationary distribution, implying that
for asymptotically large T , the distribution of M is independent of T .
Let DT be the accumulated distortion over an interval of length T . The expected
distortion D(T ) can be written using renewal-reward theory as follows.
D(T ) =
DT + 41 E[M ]
DT + 14 E[M ]
=
E[L]
T + E[M ]
(3.104)
and
DT + 12 + 14 E[M ]
D(T + 1) =
T + E[M ] + 1
(3.105)
Consider the rate of the encoded sequence. Note that the randomness in the sequence
Z1L is entirely determined by the length of the renewal interval L. Thus, the rate of
the encoded sequence is given by.
R(T ) =
H(L)
H(M )
=
.
E[L]
E[M ] + T
117
(3.106)
Recall that for asymptotically large T , M is independent of T , and therefore
R(T + 1) =
H(M )
H(L + 1)
=
.
E[L + 1]
E[M ] + T + 1
(3.107)
Combining (3.104)-(3.107) with equation (3.102),
s(T ) =
H(M )
E[M ]+T +1
DT + 12 + 14 E[M ]
T +E[M ]+1
−
H(M )
E[M ]+T
−
DT + 14 E[M ]
T +E[M ]
(3.108)
H(M )(E[M ] + T ) − H(M )(E[M ] + T + 1)
(DT + + 14 E[M ])(T + E[M ]) − (DT + 41 E[M ])(T + E[M ] + 1)
−H(M )
= 1
T + 14 E[M ] − DT
2
=
1
2
(3.109)
(3.110)
The accumulated distortion over an interval of length T , DT , is at most a fixed constant less than T2 . While the state process is approaching its steady state distribution,
the expected accumulated distribution will be less than
steady state is reached. Therefore,
T
2
T
,
2
and will grow as
T
2
after
− DT is equal to a fixed constant, denoted as c.
Thus,
s(T ) =
−H(M )
c + 41 E[M ]
(3.111)
which is a constant with respect to T , implying the rate distortion function is asymptotically linear.
118
Chapter
4
Centralized vs. Distributed: Wireless
Scheduling with Delayed CSI
In order to schedule transmissions to achieve maximum throughput, a centralized
scheduler must opportunistically make decisions based on the current state of each
time-varying channel [61]. The channel state of a link can be measured by its adjacent
nodes, who forward this channel state information (CSI) across the network to the
scheduler. CSI updates can be piggy-backed on top of data transmissions, or sent
before data transmissions if time slots are large enough. However, due to the transmission and propagation delays over the wireless links, it may take several time-slots
for the scheduler to collect CSI throughout the network, and in that time the network
state may have changed.
While the majority of works on wireless scheduling assume current CSI is globally available, in practice the available CSI is a delayed view of the network state.
Furthermore, the delay in CSI is proportional to the distance of each link from the
controller, since CSI updates must traverse the network. Due to the memory in the
channel state process, delayed CSI can be used for scheduling; however, the presence
of delays results in a lower expected throughput [41].
Centralized scheduling algorithms, in which a central entity makes a scheduling
decision for the entire network, yield high theoretical performance, since the central
entity uses current CSI throughout the network to compute a globally optimal sched119
ule. However, maintaining current CSI is impractical, due to the latency in acquiring
CSI throughout the network. An alternative is a distributed approach, in which
each node makes an independent transmission decision based on locally available
CSI. These distributed algorithms require no exchange of state information across
the network; however, nodes must coordinate to avoid interference. Moreover, local
decisions made by distributed algorithms may not achieve a global optimum. As a
consequence, distributed algorithms typically achieve only a fraction of the throughput of their ideal centralized counterparts [46]. However, due to delays in gathering
CSI for a centralized approach, distributed scheduling may result in a comparatively
higher throughput.
In this chapter, we propose a new model for delayed CSI to capture the effect
of distance on CSI accuracy. Under this framework, nodes have more accurate CSI
pertaining to neighboring links, and progressively less accurate CSI for distant links.
We demonstrate that a distributed scheduling approach often outperforms the optimal
centralized approach with delayed CSI. In doing so, we illustrate the effect that delays
in CSI have on the throughput performance of centralized scheduling. We show that
as the memory in the channel state process decreases, there exists a distributed policy
that outperforms the optimal centralized policy.
Additionally, we develop sufficient conditions under which there exists a distributed scheduling policy that outperforms the optimal centralized policy in tree and
clique networks, illustrating the impact of topology on achievable throughput. We
provide simulation results to demonstrate the performance on different topologies,
showing that distributed scheduling eventually outperforms centralized scheduling
with delayed CSI.
Lastly, we propose a partially distributed scheme, in which a network is partitioned
into subgraphs and a controller is assigned to each subgraph. This approach combines
elements from centralized and distributed scheduling in order to trade-off between the
effects of delayed CSI and the sub-optimality of local decisions. We show that there
exists a regime in which this approach outperforms both the fully distributed and
centralized approaches.
120
As mentioned in Chapter 1, Ying and Shakkottai study throughput optimal
scheduling and routing with delayed CSI and QLI [69]. They show that the throughput optimal policy activates a max-weight schedule, where the weight on each link is
given by the product of delayed queue length and the conditional expected channel
state given the delayed CSI. Additionally, they propose a threshold-based distributed
policy which is shown to be throughput optimal (among a class of distributed policies). In their work, the authors assume arbitrary delays and do not consider the
dependence of delay on the network topology. In contrast, by accounting for the relationship between CSI delay and network topology, we are able to effectively compare
centralized and distributed scheduling.
The remainder of this chapter is organized as follows. In Section 4.1, we present
the mathematical model and problem formulation used in this work, elaborating on
the structure of delayed CSI, as well as the properties of centralized and distributed
algorithms. In Section 4.2, we show that as the memory in the channel state process
decreases, distributed scheduling eventually outperforms centralized scheduling. In
Sections 4.3 and 4.4, we analytically characterize the expected throughput in tree and
clique topologies respectively. We present simulation results comparing centralized
and distributed policies in Section 4.5. Lastly, in Section 4.6, we introduce a graph
partitioning scheme for binary trees and show that there exists a partially distributed
approach that outperforms both the fully centralized and distributed approaches.
4.1
Model and Problem Formulation
Consider a network consisting of a set of nodes N , and a set of links (sourcedestination pairs) denoted by L. Time is slotted, and in each slot a set of links
is chosen to transmit. This set of activated links must satisfy an interference constraint. In this work, we use a primary interference model, in which each node is
constrained to only activate one neighboring link. In other words, the set of activated
links forms a matching 1 , as shown in Figure 4-1.
1
A matching is a set of links such that no two links share an endpoint.
121
Figure 4-1: Feasible link activation under primary link interference. Bold edges represent
activated links.
Each link l ∈ L has a time-varying channel state sl ∈ {0, 1}, and is governed by
the Markov Chain in Figure 4-2. The state of the channel at link l represents the rate
at which data can be transmitted over that link. A channel state of 0 implies that the
channel is in an OFF state, and no data can be transmitted over that link. A channel
state of 1 corresponds to an ON channel, which can support a unit throughput (i.e.
one packet transmission per slot). We are interested in a link activation policy which
maximizes the average sum-rate in the network.
p
1−p
OFF
ON
1−q
q
Figure 4-2: Markov Chain describing the channel state evolution of each independent channel.
4.1.1
Delayed Channel State Information
An efficient link activation depends on the current state of each channel. Assume that
every node has CSI pertaining to each link in the network; however, this information
is delayed by an amount of time proportional to the distance between the node and
the link in question. Specifically, a node n has k-step delayed CSI of links in Nk+1 (n),
122
where Nk (n) is the set of links that are k hops away from n. In other words, each
node has current CSI pertaining to its adjacent links, 1-hop delayed CSI of its 2-hop
neighboring links, and so on, as shown in Figure 4-3. This results in each node having
progressively less accurate information of more distant links, modeling the effect of
propagation and transmission delays on the process of collecting CSI.
Figure 4-3: Delayed CSI structure for centralized scheduling. Controller (denoted by crown)
has full CSI of red bold links, one-hop delayed CSI of green dashed links, and two-hop
delayed CSI of blue dotted links.
4.1.2
Scheduling Disciplines
We compare scheduling disciplines based on which nodes make transmission decisions,
and what CSI is available. In particular, we compare policies which are centralized,
where one controller uses delayed CSI to make a decision for the entire network, and
policies which are distributed, where multiple controllers make decisions based only
on current local information.
Centralized Scheduling
A centralized scheduling algorithm consists of a single entity making a global scheduling decision for the entire network. In this work, one node is appointed to be the
centralized decision-maker, referred to as the controller. The controller has delayed
CSI of each link, where the delay is relative to that links distance from the controller,
and makes a scheduling decision based on the delayed CSI. This decision is then
123
broadcasted across the network. We assume the centralized controller broadcasts
the chosen schedule to the other nodes in the network instantaneously. In practice,
the decision takes a similar amount of time to propagate from the controller as the
time required to gather CSI, which effectively doubles the impact of delay in the
CSI. Therefore, the theoretical performance of the centralized scheduling algorithm
derived in this work is an upper bound on the performance achievable in practice.
Let Sl (t) be the state of the channel associated with link l at time t, and let dr (l)
be the distance (in hops) of that link from the controller r. The controller has an
estimate of this state based on the delayed CSI. Define the belief of a channel to be
the probability that a channel is ON given the available CSI at the controller. For
link l, the belief xl (t) is given by
xl (t) = P Sl (t) = 1Sl (t − dr (l)) .
(4.1)
The belief is derived from the k-step transition probabilities of the Markov chain in
Figure 4-2. Namely,
P S(t) = j S(t − k) = i = pkij ,
(4.2)
where pkij is computed as
q + p(1 − p − q)k k
p − p(1 − p − q)k
, p01 =
p+q
p+q
k
q − q(1 − p − q) k
p + q(1 − p − q)k
=
, p11 =
.
p+q
p+q
pk00 =
pk10
(4.3)
Throughout this work, assume that 1 − q ≥ p, corresponding to “positive memory,”
i.e., an ON channel is more likely to remain ON than turn OFF. Consequently, the
k-step transition probabilities satisfy the following inequality:
0 ≤ pi01 ≤ pj01 ≤ pk11 ≤ pl11 ≤ 1
∀i ≤ j
∀l ≤ k
(4.4)
As the CSI of a channel grows stale, the probability that the channel is ON is given
124
by the stationary distribution of the chain in Figure 4-2, and denoted as π.
lim pk01 = lim pk11 = π =
k→∞
k→∞
p
.
p+q
(4.5)
Since the objective is to maximize the expected sum-rate throughput, the optimal
scheduling decision at each time slot is given by the maximum likelihood (ML) rule,
which is to activate the links that are most likely to be ON, i.e. the links with the
highest belief. Under the primary interference constraint, a set of links can only
be scheduled simultaneously if that set forms a matching. Let M be the set of all
matchings in the network. The maximum expected sum-rate is formulated as
X
Sl (t){Sl (t − dr (l))}l∈L
max E
m∈M
(4.6)
l∈m
= max
m∈M
= max
m∈M
X E Sl (t)Sl (t − dr (l))
(4.7)
l∈m
X
xl (t).
(4.8)
l∈m
Thus, the optimal schedule is a maximum weighted matching, where the weight of
each link is equal to the belief at the current time.
Distributed Scheduling
A distributed scheduling algorithm consists of multiple entities making independent
decisions without explicit coordination. In this work, each node makes a transmission
decision for its neighboring links using only local information (CSI of adjacent links),
which is readily available at each node, resulting in performance that is unaffected by
the delay in CSI. The drawback of such policies is that transmission decisions made
using local information may not be globally optimal.
In order to avoid collisions, we consider distributed policies in which decisions
are made sequentially. As a consequence, in addition to having local information,
each node observes the actions of neighboring nodes in a manner similar to collision
125
avoidance in a CSMA-CA system. If a node begins transmission, neighboring nodes
detect this transmission and activate a non-conflicting link rather than an interfering
link. This allows us to focus on the sub-optimality resulting from making a local
instead of a global decision, rather than the transmission coordination needed to
avoid collisions. Moreover, alternative transmission coordination schemes are also
possible based on RTS/CTS exchanges [42]. To determine the sequence in which
decisions are made, priorities are assigned to each node off-line, and transmissions
are made in order of node priority2 .
n
1
0
0
n
1
1
1
1
0
(a) Suboptimal Matching
0
1
1
1
(b) Optimal Matching
Figure 4-4: Example network: All links are labeled by their channel state at the current
time. Bold links represent activated links.
While distributed algorithms are designed to avoid collisions between neighboring
transmitters, the restriction to using only local information results in distributed
algorithms suffering from “missed opportunities”. In graph-theoretic language, the
distributed scheduler returns a maximal matching, or a local optimum, rather than
a maximum matching, or a global optimum. For example, in Figure 4-4, node n can
choose to schedule either of its neighboring links; if it schedules its right child link,
then the total sum rate of the resulting schedule is 1, as in Figure 4-4a, whereas
scheduling the left link results in a sum rate of 2, as in Figure 4-4b. In a distributed
framework, node n is unaware of the state of the rest of the network, so it makes an
arbitrary decision resulting in a throughput loss. If node n was a centralized controller
(with perfect CSI), it would always return the schedule in Figure 4-4b. Moreover, the
loss in efficiency due to suboptimal decisions becomes more pronounced when moving
2
Here we assume a small propagation delay, such that nodes can immediately detect if a neighbor
is transmitting.
126
beyond the simple two-state channel model.
Partially Distributed Scheduling
A third class of scheduling algorithms is those that combine elements of centralized
and distributed scheduling. These algorithms are referred to as partially distributed
scheduling algorithms. A partially distributed approach divides the network into
multiple control regions, and assigns a controller to schedule the links in each region.
Each controller has delayed CSI pertaining to the links in its control region, and
no CSI pertaining to links in other regions. This allows for scheduling with fresher
information than a purely centralized scheme, and less local sub-optimality than a
fully distributed scheme. These policies are explored in Section 4.6.
4.2
Centralized vs. Distributed Scheduling
In the previous section, we introduced two primary classes of scheduling policies: distributed and centralized policies. It is known that a centralized scheme using perfect
CSI outperforms distributed schemes, due to the aforementioned loss of efficiency in
localized decisions. However, these results ignore the effect of delays in CSI. In this
section, we revisit the comparison between centralized and distributed scheduling,
taking into account the delay in collecting CSI. We show that for sufficiently large
CSI delays, distributed policies perform at least as well as the optimal centralized
policy.
2
1
3
4
Figure 4-5: Four-node ring topology.
127
As an example, consider the four node network in Figure 4-5, and a symmetric
channel state model satisfying p = q. Without loss of generality, assume node 1 is
the controller. In a centralized scheduling scheme, node 1 chooses a schedule based
on current CSI for links (1, 2) and (1, 4), and 1-hop delayed CSI for links (2, 3) and
(3, 4). The resulting expected throughput is computed by conditioning on the state
of each link.
C(p) = 14 ( 34 (1 − p) + 41 p) + 12 · (1 + 21 ) + 14 (1 + 43 (1 − p) + 14 p) =
11
8
− 14 p.
(4.9)
Now consider a distributed schedule, in which node 1 makes a scheduling decision
based on the state of adjacent links (1, 2) and (1, 4). After this decision is made, node
3 makes a non-conflicting decision based on the state of links (3, 1) and (3, 4). The
resulting expected throughput is given by
D=
1 3 3
21
· + · (1 + 21 ) = .
4 4 4
16
(4.10)
The expected throughput for centralized and distributed scheduling in (4.9) and (4.10)
is plotted in Figure 4-6. As the channel transition probability p increases, the memory
in the channel decreases, and the expected throughput of a centralized scheduler
decreases. The distributed scheduler, on the other hand, is unaffected by the channel
transition probability, as it only uses non-delayed local CSI. For channel transition
probabilities p ≥ 14 , distributed scheduling outperforms centralized scheduling over
this network.
The throughput degradation of the centralized scheme is a function of the memory
in the channel state process. Let µ be a metric reflecting this memory. In the case of
a two-state Markov chain, we define
µ , 1 − p − q.
(4.11)
Note that µ is the second eigenvalue of the transition matrix for the two-state Markov
chain, and thus represents the rate at which the chain converges to its steady state
128
Figure 4-6: Expected sum-rate throughput for centralized and distributed scheduling algorithms over four-node ring topology, as a function of channel transition probability p.
distribution [21].
Theorem 17. For a fixed steady-state probability π, there exists a threshold µ∗ such
that if µ ≤ µ∗ , there exists a distributed scheduling policy that outperforms the optimal
centralized scheduling policy.
In order to prove Theorem 17, we present several intermediate results pertaining
to the expected sum-rate throughput of both the distributed and centralized schemes.
Lemma 11. For a fixed steady-state probability π, and state transition probabilities
p and q =
π
p,
1−π
the expected sum-rate of any distributed policy is independent of the
channel memory µ.
Proof. This follows from the definition of a distributed policy in Section 4.1.2. Since
distributed policies are restricted to only use CSI of neighboring links, which is available to each node without delay, the values of p and q do not affect the sum-rate. The
expected sum-rate of a distributed policy only depends on the steady-state probability that links are ON. For fixed π, the expected sum-rate of the distributed policy is
constant.
Lemma 12. The expected sum-rate of the optimal centralized policy is greater than
or equal to that of any distributed policy when µ = 1.
Proof. When µ = 1, there is full memory in the channel state process, i.e. p = 0,
and q = 0. In this case, the centralized policy has perfect CSI throughout the
129
network, and activates the sum-rate maximizing schedule, representing a globally
optimal solution.
Lemma 13. There exists a distributed policy with sum rate greater than or equal to
the sum rate of the optimal centralized policy when µ = 0.
Proof. If µ = 0, then the channel transition probabilities p and q satisfy p = 1 − q.
In this scenario, there is no memory in the channel state process, and delayed CSI is
useless in predicting the current channel state. To see this, consider the conditional
probability of a channel state given the previous channel state.
P(S(t + 1) = 1|S(t) = 0) = p = 1 − q = P(S(t + 1) = 1|S(t) = 1)
(4.12)
P(S(t + 1) = 0|S(t) = 0) = 1 − p = q = P(S(t + 1) = 0|S(t) = 1)
(4.13)
Thus, when µ = 0, the channel state process is IID over time.
Let G be the graph representing the topology of the network with the controller
labeled as node 0. Let N0 be the set of neighbors of node 0, and ∆ be the degree
of node 0, i.e. ∆ = |N0 |. Let G0 ⊂ G be the graph obtained by removing the links
adjacent to the controller from the network. Similarly, let Gi ⊂ G0 be the graph
obtained by removing the links adjacent to node i from G0 . Recall, a matching M of
a graph G is any subset of the edges of G such that no two edges share a node. Let
M0 be a maximum (cardinality) matching over G0 , and Mi be a maximum cardinality
matching over Gi .
Due to the IID channel process, each link adjacent to the controller either has
belief 0 or 1, and each non-adjacent link has belief π. Thus, the optimal centralized
scheduler observes the state of its adjacent links and chooses a maximum throughput
link activation. There are 2∆ possible state combinations observed by the controller;
however, due to the fact that the controller can only activate one adjacent link,
the optimal centralized schedule is one of at most ∆ + 1 matchings. Without loss
of generality, when the controller does not activate an adjacent link, it activates
matching M0 , and if the controller activates link (0, i) for i ∈ N0 , then it also activates
matching Mi .
130
Lemma 13 is proved by constructing a distributed policy which activates the same
links as the optimal centralized schedule. The ∆ + 1 potential activations can be
computed off-line3 , and we assume each node knows the set of possible activations.
Each node must determine which activation to use in a distributed manner. To
accomplish this, node 0 activates the same adjacent link as in the centralized scheme,
which is feasible since the centralized controller uses only local CSI when µ = 0. Every
other node n activates links according to the matching M0 , unless that activation
interferes with a neighboring activation. If a conflict occurs, then node 0 must have
transmitted according to some other Mi for i ∈ N0 , and node n detects this conflict,
and activates links according to the appropriate Mi . The remainder of the proof
explains the details of this distributed algorithm.
i
Path Pi
0
Figure 4-7: Example of combining matchings to generate components. Red links and blue
links correspond to maximum cardinality matchings M0 and Mi . The component containing
node i is referred to as path Pi .
Consider the graph composed of the nodes in G and the edges in both M0 and Mi ,
as done in [47], labeling edges in M0 as red and edges in Mi as blue. An example is
shown in Figure 4-7. The resulting graph consists of multiple connected components,
where each component is either a path or a cycle alternating between red and blue
links. Note that every component not containing node i has the same number of
red and blue links, since both matchings have maximum cardinality. Consider the
component including node i, which must be a path since no blue links can be adjacent
3
To compute the set of potential activations, consider the case where only one link adjacent to
the controller is ON, as well as the case where all adjacent links are OFF.
131
to node i. Denote this path as Pi . If node 0 schedules link (0, i), then nodes in path
Pi must schedule blue links instead of red links. Since each node detects neighboring
transmissions, this can be accomplished in a distributed manner. In all other components, either red links or blue links can be scheduled to obtain maximum throughput,
because each component has equal red and blue links, and switching between red and
blue links will not affect any other components.
x
i
Pj
Pi
j
Pj
x
i
y
j
Pi
n
n
(a) Scenario 1.
A conflict detected from neighbor x corresponds
to matching Mi , and a conflict detected from neighbor y corresponds
to matching Mj .
(b) Scenario 2. Node n can activate
either according to Mi or Mj if a conflict is detected at neighbor x.
Figure 4-8: Abstract representation of a node n’s position on multiple conflicting paths.
The remaining detail concerns the decision of which of the ∆ alternate matchings
to use if M0 conflicts with a neighboring transmission. As explained above, node
n is informed of the switch to matching Mi by blue links being activated on path
Pi , propagating from node i. If node n does not lie on any path Pi for i ∈ N0 , then
activating links according to matching M0 never conflicts with any other transmissions
at node n. If node n lies on a single path Pi , then upon detecting a conflicting
transmission, node n switches to matching Mi . If there are i, j ∈ N0 , such that
n ∈ Pi and n ∈ Pj , then node n decides between Mi and Mj based on the direction
(neighbor) from which the conflicting transmission is detected, as illustrated in Figure
4-8a. If Pi and Pj are such that the conflicting link at node n is detected from the
same neighbor, as in Figure 4-8b, then either Mi and Mj can be used.
Lemma 14. Let C(p, q) be the sum-rate of the optimal centralized algorithm as a
function of the channel transition probabilities p and q. For a fixed value of π, C(p, q)
is monotonically increasing in µ = 1 − p − q.
132
Proof. Let Φ represent the set of feasible schedules (matchings), and φ ∈ Φ be a binary
vector, such that φl indicates whether link l is activated in the schedule. Consider
two channel-state distributions, one with transition probabilities p1 and q1 , and the
other with probabilities p2 and q2 , satisfying π1 = π2 = π. Furthermore, assume that
µ1 ≥ µ2 . Let aks,1 (bks,1 ) represent the k-step transition probability from s to 1 when
the one-step transition probabilities are p1 and q1 (p2 and q2 ). Lastly, let dr (l) be
the distance of link l from controller r, and let S(t − dr ) = Sl (t − dr (l)) l∈L be the
delayed CSI vector, where the lth element is the delayed CSI of link l with delay equal
to dr (l) slots.
Let φ1 (s) and φ2 (s) be binary vectors representing the optimal schedules for state
s, when the state transition probability is (p1 , q1 ) and (p2 , q2 ) respectively, with an
arbitrary rule for breaking ties, i.e.
φ1 (s) = arg max
φ∈Φ
φ2 (s) = arg max
φ∈Φ
X
d (l)
(4.14)
d (l)
(4.15)
φl aslr,1
l∈L
X
φl bslr,1 .
l∈L
The expected sum-rate of the centralized scheme is expressed as
C(p1 , q1 ) =
X
P(S(t − dr ) = s)
s∈S
C(p2 , q2 ) =
X
X
d (l)
(4.16)
d (l)
(4.17)
φ1l (s)aslr,1
l∈L
P(S(t − dr ) = s)
s∈S
X
φ2l (s)bslr,1 .
l∈L
To prove the monotonicity of C(p, q), we show that for all p1 , q1 , p2 , q2 satisfying
π1 = π2 and µ1 ≥ µ2 ,
C(p1 , q1 ) − C(p2 , q2 ) ≥ 0.
(4.18)
The above difference is bounded as follows.
C(p1 , q1 ) − C(p2 , q2 ) =
X
P(S(t − dr ) = s)
s∈S
X
l∈L
133
d (l)
φ1l (s)aslr,1
−
X
P(S(t − dr ) = s)
X
s∈S
≥
X
d (l)
φ2l (s)bslr,1
(4.19)
l∈L
P(S(t − dr ) = s)
s∈S
X
dr (l)
dr (l)
2
φl (s) asl ,1 − bsl ,1
(4.20)
l∈L
where the inequality follows from the fact that φ2 is the maximizing schedule for
channel 2, and not channel 1. The proof follows by partitioning the state space into
sets of states for which every state in a sett yields the same optimal schedule. Let
Sφ ⊂ S be the set of states such that φ is the optimal schedule, i.e.,
Sφ = s ∈ S|φ2 (s) = φ .
(4.21)
Due to the arbitrary tie-breaking rule in the optimization of φ2 (s) in (4.15), each s
S
belongs to exactly one Sφ . In other words, the sets {Sφi }i are disjoint, and φ∈Φ Sφ =
S. Therefore, (4.20) can be rewritten as
C(p1 , q1 ) − C(p2 , q2 ) ≥
XX
P(S(t − dr ) = s)
φ∈Φ s∈Sφ
d (l)
X
φl
d (l)
aslr,1
−
d (l)
bslr,1
.
(4.22)
l∈L
d (l)
The quantity aslr,1 − bslr,1 simplifies using (4.3) and µi = 1 − pi − qi .
d (l)
d (l)
d (l)
d (l)
aslr,1 − bslr,1 = π + (sl − π)µ1r − π − (sl − π)µ2r
d (l)
d (l) = (sl − π) µ1r − µ2r
(4.23)
(4.24)
Combining (4.22) and (4.24) yields
C(p1 , q1 ) − C(p2 , q2 ) ≥
XX
P(S(t − dr ) = s)
φ∈Φ s∈Sφ
X
d (l)
d (l) φl · (sl − π) µ1r − µ2r
l∈L
(4.25)
=
XXX
φ∈Φ s∈Sφ l∈L
φl
Y
d (l)
d (l) P(Sj (t − dr (j)) = sj ) (sl − π) µ1r − µ2r
j∈L
134
(4.26)
=
XX X
φl π(1 − π)(−1)
1−sl
d (l)
µ1 r
−
Y
d (l) µ2r
φ∈Φ l∈L s∈Sφ
P(Sj (t − dr (j)) = sj )
j∈L\l
(4.27)
where (4.26) follows from the independence of the channel state process across links,
and (4.27) follows from:
P(Sl (t − dr (l)) = sl )(sl − π) = (πsl + (1 − π)(1 − sl ))(sl − π)
(4.28)
= πsl (sl − π) + (1 − π)(1 − sl )(sl − π)
(4.29)
= (−1)1−sl π(1 − π)
(4.30)
We prove that for any schedule φ ∈ Φ and link l ∈ L,
X
1−sl
φl π(1 − π)(−1)
dr (l)
d (l) µ1 − µ2 r
s∈Sφ
Y
P(Sj (t − dr (j)) = sj ) ≥ 0
(4.31)
j∈L\l
Fix a schedule φ ∈ Φ and link l ∈ L. The summand in (4.31) is non-zero only
if φl = 1, i.e. the link l is in the schedule φ. The summand is negative if and only
if sl = 0. Consider a delayed CSI vector s ∈ Sφ such that sl = 0, and the delayed
CSI vector s obtained from changing the lth element of s to 1, i.e., sj = sj ∀j 6= l,
sl = 1. Since s ∈ Sφ , it follows that s ∈ Sφ . This is because link l is scheduled under
φ, and the throughput obtained by scheduling link l is strictly increased in moving
from s to s, so the same schedule must remain optimal. Therefore, for every element
s ∈ Sφ contributing a negative term to the summation in (4.31), there exists another
state s ∈ Sφ contributing a positive term of equal magnitude, implying that the entire
summation must be non-negative.
Proof of Theorem 17. Let C(µ) be the expected sum-rate throughput of the optimal
centralized algorithm as a function of the memory in the channel. This theorem is
proved by showing that there exists a distributed policy with expected sum-rate D(π),
such that the relationship between C(µ) and D(π) is similar to that in Figure 4-6 for
135
fixed π 4 . Since C(µ) is monotonically increasing in µ (Lemma 14), with C(1) ≥ D(π)
(Lemma 12) , and C(0) ≤ D(π) (Lemma 13), and D(π) is constant over µ for fixed
π (Lemma 11), then C(µ) must intersect D(π), and this intersection occurs at µ∗ for
some 0 ≤ µ∗ ≤ 1.
Theorem 17 proves the existence of a threshold µ∗ , such that for µ ≤ µ∗ , distributed scheduling outperforms the optimal centralized scheduler. The value of µ∗
depends on the topology, and in general, this threshold is difficult to compute. In
some topologies, µ∗ is 0 or 1 implying that distributed scheduling is always optimal,
or that distributed scheduling is only optimal if there is no memory in the channel.
In the following sections, we characterize the value of µ∗ in tree networks (Section
4.3) and clique networks (Section 4.4), and show for large networks, µ∗ approaches 1.
4.3
Tree Topologies
In this section, we characterize the expected throughput over networks with tree
topologies. The acyclic nature of these graphs make them amenable to analysis. We
focus on rooted trees, such that one node is the root and every other node has a depth
equal to the distance to the root. Furthermore, for any node v, the nodes that are
connected to v but have depth greater than v are referred to as children of v, and
children of v are siblings of one another. If u is a child of v, then v is the parent of
u. This familial nomenclature is standard in the graph-theoretic literature [26], and
simplifies description of the algorithms over tree networks. A complete k-ary tree of
depth n is a tree such that each node of depth less than n has k children, and the
nodes at depth n are leaf nodes, i.e. they have no children. Additionally, this section
focuses on symmetric channel models such that p = q to simplify the analysis, but
the results are easily extended to asymmetric channels as well.
4
Figure 4-6 presents throughput as a decreasing function of p, where as in this theorem we have
an increasing function of µ
136
4.3.1
Distributed Scheduling on Tree Networks
Consider applying the distributed scheduling algorithm in Section 4.1.2 over a complete k-ary tree of depth n, where priorities are assigned in order of depth (lower
depth has higher priority). The root node first makes a decision for its neighboring
links. Then, the children of the root attempt to activate on of their child links, if this
activation does not conflict with their parent’s decision. Consequently, the average
sum rate can be written recursively. Let Dnk be the average sum rate of the distributed
algorithm over a complete k-ary tree of depth n. To begin, consider the case of k = 2
(binary tree).
Dn2
Dn2
0
2
Dn−1
1
?
0
2
Dn−1
2
Dn−1
2
Dn−2
2
Dn−2
(b) When at least one link is ON, it
is scheduled. Dotted links cannot be
activated, so expected throughput is
2
2 .
1 + Dn−1
+ 2Dn−2
(a) When both adjacent links are
OFF, neither are scheduled. Ex2 .
pected throughput is 2Dn−1
Figure 4-9: Recursive distributed scheduling over binary trees.
Dn2 =
1
3
3 5 2
3 2
2
2
2
+ (1 + Dn−1
+ 2Dn−2
) = + Dn−1
+ Dn−2
· 2Dn−1
4
4
4 4
2
(4.32)
Equation (4.32) follows from conditioning on the links adjacent to the root. The
first term corresponds to the case where both links are OFF. In this case, neither is
activated and the algorithm recurses over the subtrees rooted by the children, as in
Figure 4-9a. If at least one link is ON, it is activated. In this case, that child cannot
transmit, so control passes to the grand-children, as in Figure 4-9b. Solving the above
2
recursion for n ≥ 1 using D02 = D−1
= 0, yields
137
Dn2
9
=−
77
3
−
4
n
+
6
3
· 2n − .
11
7
(4.33)
The average sum-rate in (4.33) of the distributed scheduling algorithm is independent of the link transition probability p, as each node only uses the CSI of the
neighboring links, which is available without delay. This follows from Lemma 11.
Consider the asymptotic per-link throughput as the number of links grows large.
An n-level binary tree has 2n+1 − 2 links. Using the expression in (4.33), and taking
the limit as n grows large while dividing by the number of links, yields
Dn2
3
= .
n+1
n→∞ 2
−2
11
lim
(4.34)
Thus, the distributed priority algorithm achieves a throughput of at least
3
11
per
link. A similar analysis is applied to a general full k-ary tree, and a recursive expression is written in the vein of (4.32).
k
k
k
Dnk = ( 21 )k · kDn−1
+ 1 − ( 12 )k 1 + (k − 1)Dn−1
+ kDn−2
(4.35)
A closed-form expression is obtained by solving the above recursion.
n
1 k
n+1 k+1
k
k
2 − 1 1 + k (2
− 1) − 2 (k + 1) + (1 + k)(2 − 1) − 1 − ( 2 )
k
Dn =
k2k + 2k − 1 (k − 1) 2k+1 − 1
(4.36)
k
To determine the asymptotic per-link throughput, we divide (4.36) by the number
of links in a k-ary tree,
lim
kn+1 −1
k−1
− 1. Taking a limit as n grows large,
Dnk
n+1 −1
n→∞ k
k−1
−1
(k − 1)Dnk
2k − 1
=
.
n→∞ k n+1 − k
2k − 1 + k · 2k
= lim
Since for large k, 2k >> 1, this limit is approximately equal to
1
.
k+1
(4.37)
Intuitively, each
node can only activate one neighboring link, and each node has k + 1 neighbors.
138
4.3.2
On Distributed Optimality
In the above analysis of distributed scheduling over tree networks, it was assumed
that priorities are assigned such that nodes closer to the root have higher priority.
Interestingly, for tree networks, there exists an ordering of priorities such that the
distributed policy is optimal, i.e. returns a schedule of maximum weight, and therefore
always performs at least as well as the optimal centralized scheduler.
Theorem 18. There exists an optimal distributed algorithm on tree networks that
obtains an expected sum-rate equal to that of a centralized scheduler with perfect information.
l
l
(b) Augmented matching including
l.
(a) Maximum matching 1.
Figure 4-10: Example Matchings. If link l is required to be in the matching, there exists a
new maximal matching including l.
Proof. Consider the policy that gives priority to the leaves of the network. If a link
adjacent to a leaf is ON, without loss of generality, there exists a maximum matching
containing that link. Assume the optimal matching did not include this ON link. A
new matching is constructed by adding the leaf link, and removing the link which
interferes with it, as illustrated in Figure 4-10. Since the new link is adjacent to a
leaf, at most one interferer exists in the matching. Thus, the augmented matching is
also optimal. Therefore, it is always optimal to include an ON leaf link in the optimal
matching. The links interfering with that leaf cannot be activated, and the algorithm
recurses. In conclusion, assigning priorities in order of highest depth to lowest depth
results in a maximum matching.
While Theorem 18 shows that there exists an optimal priority assignment, it
139
does not hold for general topologies. Thus, we use the results in Section 4.3.1 to
compare the cost of suboptimal local decisions to the cost of scheduling with delayed
information.
4.3.3
Centralized Scheduling on Tree Topologies
The optimal centralized policy schedules a maximum weight matching over the network, where the weight of each link is the belief given the delayed CSI. For tree
networks, the maximum-weight matching is the solution to a dynamic programming
(DP) problem. Consider a node v ∈ N . Let g1 (v) be the maximum weight matching
of the subtree rooted at v, assuming that v activates one of its child links. Let g2 (v)
be the maximum weight matching of the subtree rooted at v assuming that v cannot
activate a child link, due to interference from the parent of v. Let r ∈ N be the
controller (root of the tree), and dr (v) be the distance of node v from r. Let child(v)
be the set of children to node v. Assume the controller has delayed CSI of each link
(u, v) equal to s(u, v). The DP formulation for the weight of the optimal max-weight
matching g ∗ (v) is given by
g ∗ (v) = max(g1 (v), g2 (v))
X
g2 (v) =
g ∗ (u)
(4.38)
(4.39)
u∈child(v)
g1 (v) =
dr (v)
max
ps(u,v),1
+ g2 (u) +
u∈child(v)
X
∗
g (n)
n∈child(v)\u
d (v)
r
= g2 (v) + max (ps(u,v),1
+ g2 (u) − g ∗ (u))
(4.40)
u∈child(v)
While (4.38) - (4.40) give the optimal centralized schedule for a specific observation
of delayed CSI, computing the average sum rate requires taking an expectation over
the delayed CSI. For smaller trees, of depth 2, a closed-form expression for the average
sum-rate is given in Section 4.3.3. For larger trees, this analysis becomes difficult;
thus, bounds on the expected solution to the DP are derived in Section 4.3.3.
140
Let Cnk be the average sum rate of the centralized algorithm over a full k-ary tree
of depth n, when the root node is chosen to be the controller. Hence, the root node
makes a decision for each link in the network based off of delayed CSI, where delays
are proportional to depth.
Sum-Rate Analysis for Trees of Depth 2
The centralized scheduling algorithm does not yield a simple recursive expression for
sum-rate throughput, as in the distributed case; however, the centralized sum-rate is
analytically characterized for simple trees. For a binary tree of depth 1, the centralized
scheduling algorithm and the distributed scheduling algorithm are equivalent, since
in both cases, decisions are made with full CSI. Therefore, the sum rate is 34 . Now
consider a binary tree of depth 2. The expected sum rate is computed by conditioning
on the channel state of the links adjacent to the controller. If both links to the root
are OFF, neither will be scheduled. If only one adjacent link is ON, it will always be
scheduled. If both links are ON, then the controller must use the state of the links
at depth 2 to determine which adjacent link to schedule.
C22 =
1
·2
4
3
(1
4
1
1
− p) + 41 p + 2 (1 + 34 (1 − p) + 14 p) + 1 +
4
4
15
(1
16
− p) +
1
p
16
(4.41)
=
3 3
1
1
+ (1 − p) + p +
4 4
4
4
15
(1
16
− p) +
1
p
16
(4.42)
Unlike the distributed case, the performance of the centralized scheduler is clearly
dependent on the link transition probability p. If the centralized scheduler has perfect CSI, i.e. p = 0, the sum rate is
111
,
64
and when p = 12 , the sum rate is
11
.
8
Thus,
the presence of information leads to a 26% improvement in throughput. Since the
average sum-rate decreases linearly in p, there exists a threshold p∗ , such that distributed scheduling outperforms centralized scheduling for p ≥ p∗ , as in Theorem 17.
Evaluating (4.33) for n = 2 gives the expected throughput of the distributed policy,
141
27
.
16
Combining this with (4.59) gives the threshold p∗ :
p∗ =
3
≈ 0.065
46
(4.43)
Recall that the amount of memory in the channel state process is µ = 1 − 2p. Small
values of p imply that the controller has very good knowledge of the network state,
and there is little penalty to using delayed CSI. On the other hand, as p becomes
large, and the controller has stale information, it makes inaccurate decisions regarding
the links on the second level of the tree.
We now apply a similar analysis to a 2-level, k-ary tree by conditioning on the
state of each of the root’s k neighboring links.
C2k
=
( 12 )k
·k
1−
( 12 )k
(1 − p) +
( 12 )k p
k X
k 1 2
1 k
1 k
+
( 2 ) 1 + (k − n) 1 − ( 2 ) (1 − p) + ( 2 ) p + 1 − ( 12 )k )n (n − 1)(1 − p)
n
n=1
n−1 X
n
1 k m 1 k(n−m)
+
(1 − ( 2 ) ) ( 2 )
(m(1 − p) + (n − m − 1)p)
(4.44)
m
m=0
= (1 − p)(k + 1) +
k
p(k − 1) − (1 − p)k
− (1 − 2p) 1 − ( 12 )k+1
k
2
(4.45)
Comparing this to the value of D2k in (4.36), we solve for the value of p∗ (k) such
that for p ≥ p∗ (k), the distributed policy outperforms the centralized policy.
k
1 − ( 12 )k + 2k − 2 − ( 12 )k
p (k) =
k
2k − k · 2k − 2k + 2 2 − ( 21 )k − 1
∗
(4.46)
The function p∗ (k) is plotted in Figure 4-11. As k increases and the tree becomes
wider, the threshold beyond which distributed scheduling outperforms centralized
scheduling decreases exponentially, implying that distributed scheduling performs
comparatively better for larger networks. Intuitively, as the tree grows wider, the
probability of a ”missed opportunity” scenario decreases. For large networks, the
drawback of a distributed solution is reduced, while the drawback of a centralized
142
approach, namely the delay in CSI, remains constant. This is observed by the fact
that as k → ∞, the throughput of the distributed policy approaches that of the
centralized throughput with perfect information.
Figure 4-11: Threshold value of p∗ (k) such that for p > p∗ (k), distributed scheduling
outperforms centralized scheduling on 2-level, k-ary tree.
An Upper Bound on the Sum-Rate of Centralized Scheduling
In this section, the sum-rate of the centralized scheduler is upper bounded to provide
a sufficient condition for the existence of a distributed algorithm which outperforms
the optimal centralized algorithm. The upper bound is constructed by recursively
bounding the throughput attainable over a subtree. Let Cnk (δ) be the expected sumrate of a complete, k-ary subtree of depth n, where the root of that subtree is a
distance of δ hops from the controller. Thus, the CSI of a link at depth h in the
subtree is delayed by δ + h − 1 time slots. Note, Cnk (0) = Cnk as defined in Section
4.3.3.
To begin, consider the case of k = 2, i.e. the topology is a complete binary tree.
For a binary tree rooted at node v, let cL and cR be the left and right children of v
respectively. The expected sum-rate is bounded by enumerating the possible states
of the links incident to the controller. Label the links adjacent to the root as a and
143
b. If both links a and b are OFF, as in Figure 4-12a, then the root schedules neither
link, and instead schedules links over the two n − 1 depth subtrees. If only link a (link
b) is ON, then link a (b) will be scheduled, and the links adjacent to that link cannot
be scheduled, as in Figure 4-12b (Figure 4-12c). If both a and b are ON, then the
controller chooses the maximum between the scenarios in Figure 4-12b and Figure
4-12c. Combining these cases leads to an expression for centralized throughput.
1
1
2
2
2
· 2Cn−1
(1) + 2 · (1 + Cn−1
(1) + 2Cn−2
(2))
4
4
1
+
1 + E max g1 (cL ) + g2 (cR ), g2 (cL ) + g1 (cR )
4
1
3
2
2
≤ + Cn−1 (1) + Cn−2 (2) + E g1 (cL ) + g1 (cR )
4
4
3 3 2
2
= + Cn−1 (1) + Cn−2
(2)
4 2
Cn2 =
(4.47)
(4.48)
(4.49)
where g1 (·) and g2 (·) are defined in (4.39) and (4.40). The bound in (4.48) follows
from the fact that g1 (u) ≥ g2 (u) for any node u ∈ N . In order to get a recursive
expression for Cn2 , we also need to bound Cn2 (δ).
Let φl (s) be an indicator variable equal to 1 if and only if link l is activated in
the optimal schedule when the delayed CSI of the network is given by s. Similarly,
let φδl (s) be an indicator variable equal to 1 if and only if link l is activated in the
optimal schedule when the CSI is further delayed by δ slots. Applying (4.16), the
centralized sum rates are expressed as
Cn2 (0) =
X
P(S(t − dr ) = s)
s∈S
Cn2 (δ) =
X
X
P(S(t − dr ) = s)
X
l∈L
s∈S
P(S(t − dr ) = s)
X
l∈L
144
d (l)+δ
φδl (s)pslr,1
Equation (4.51) is bounded in terms of (4.50):
X
(4.50)
l∈L
s∈S
Cn2 (δ) =
d (l)
φl (s)pslr,1 ,
d (l)+δ
φδl (s)pslr,1
,
(4.51)
2
Cn
2
Cn
b
a
a
2
Cn−1
(1)
cL
cR
2
(1)
Cn−1
2
Cn−2
(2)
2
Cn−2
(2)
b
2
Cn−1
(1)
2
Cn−2
(2)
cL
cR
2
Cn−2
(2)
2
Cn−2
(2)
(a) Link a and link b are not activated. The expected throughput is
computed by the maximum expected
2 .
matching over the solid links, 2Cn−1
2
Cn−2
(2)
2
Cn−2
(2)
b
a
cL
2
Cn−2
(2)
2
Cn−2
(2)
(b) If link a is scheduled, the dashed
links cannot be scheduled, and the
solid links can.
2
Cn
2
Cn−1
(1)
2
Cn−1
(1)
cR
2
Cn−2
(2)
2
Cn−1
(1)
2
Cn−2
(2)
2
Cn−2
(2)
(c) When link b is scheduled, the
dashed links cannot be schedules but
the solid links can.
Figure 4-12: Possible scheduling scenarios for centralized scheduler.
145
= (1 − 2p)δ
X
s∈S
l∈L
+
X
X
P(S(t − dr ) = s)
P(S(t − dr ) = s)
s∈S
X
d (l)
φδl (s)pslr,1
φδl (s)pδ0,1
(4.52)
l∈L
≤ (1 − 2p)δ
X
s∈S
l∈L
+
X
X
P(S(t − dr ) = s)
P(S(t − dr ) = s)
s∈S
X
d (l)
φl (s)pslr,1
φδl (s)pδ0,1
(4.53)
l∈L
= (1 − 2p)δ Cn2 (0) + pδ0,1 E[Number of Links Activated]
1
δ
δ 2
≤ (1 − 2p) Cn (0) + p0,1 E[Number of Links]
3
1
≤ (1 − 2p)δ Cn2 (0) + pδ0,1 E[Number of Links] + 1
3
1
= (1 − 2p)δ Cn2 (0) + pδ0,1 (2n+1 + 1)
3
(4.54)
(4.55)
(4.56)
(4.57)
j
j i
Equation (4.52) follows from using the identity pi+j
s,1 = p0,1 + (1 − 2p) ps,1 . Equation
(4.53) follows from the fact that φl (s) is the sum-rate maximizing schedule in Cn2 (0).
The bound in (4.55) follows from noting that at most one third of the links can be
simultaneously scheduled due to interference. Combining the bound in (4.57) with
that in (4.49) yields a recursive expression from which the upper bound is computed.
Cn2 ≤
3 3
1
2
2
2
+ (1 − 2p)1 Cn−1
+ p(2n + 1) + (1 − 2p)2 Cn−2
+ p(1 − p)(2n−1 + 1) (4.58)
4 2
2
3
Solving the recursion in (4.58) yields a closed-form upper bound on the expected
sum-rate throughput achievable by a centralized scheduler.
1 (−146 p − 15) (p − 1/2)n
1 (−94 p + 135) (−4 p + 2)n 1 −8 p2 + 14 p + 9
+
−
80
320
6 (4 p − 1) (2 p − 3)
(2 p − 1)2
(2 p − 1)2
n
n
2
2
1 160 p − 248 p − 36 (p − 1/2)
1 40 p − 102 p + 11 (−4 p + 2)
2n
+
(4.59)
−
+
30
60
3
(2 p − 3) (2 p − 1)2
(2 p − 1)2 (4 p − 1)
Cn2 ≤
To interpret this bound, we compute the limiting ratio of the centralized throughput to the number of links in the tree (for p > 0).
146
1
Cn2
=
n+1
n→∞ 2
−2
6
lim
(4.60)
Note that this value is independent of p. This is because as long as the controller
does not have perfect knowledge (i.e. p > 0), as n grows large, infinitely many
nodes are sufficiently far from the root such that the controller has no knowledge
of their current state. One third of these links are scheduled (size of a maximum
cardinality matching) and they will be in the ON state with probability 21 . Hence,
the limiting per-link throughput is 61 . Recall from (4.34) that the per-link average
sum-rate under distributed scheduling is
3
.
11
Therefore, as the network grows large,
distributed scheduling eventually outperforms centralized scheduling, regardless of
the memory in the channel state process.
Additionally, the threshold p∗ (n) at which distributed scheduling outperforms
centralized scheduling is bounded for a tree of depth n, by equating (4.33) and (4.59).
Figure 4-13 shows this threshold as a function of n. Note that as n gets large, this
threshold approaches zero, implying that distributed is always better than centralized
in these cases, as expected from the asymptotic analysis.
Figure 4-13: Threshold value of p∗ (n) such that for p > p∗ (n), distributed scheduling
outperforms centralized scheduling on n-level, binary tree.
The bound also extends to k-ary trees. The bound in (4.55) is adapted to k-ary
147
trees through the observation that a complete k-ary tree of depth n has
kn+1 −k
k−1
Cnk (δ) = (1 − 2p)δ Cnk (0) + pδ0,1 E[Number of Links Activated]
1
E[Number of Links]e
k+1
1
≤ (1 − 2p)δ Cnk (0) + pδ0,1
E[Number of Links] + 1
1
k +
n+1
k
−k
δ k
δ
= (1 − 2p) Cn (0) + p0,1
+1
k2 − 1
≤ (1 − 2p)δ Cnk (0) + pδ0,1 d
links.
(4.61)
(4.62)
(4.63)
(4.64)
The bound in (4.64) is used to compute a recursion from which the upper bound
is derived. We bound Cnk using the same strategy as in (4.49).
1
k
k
Cnk ≤ ( 21 )k · kCn−1
(1) + k( 12 )k 1 + kCn−2
(2) + (k − 1)Cn−1
(1)
k k X
k
1
k
+
1 + kCn−1 (1)
n
2
n=2
k
k
= 1 − ( 21 )k + k 1 − ( 12 )k Cn−1
(2)
(1) + k 2 ( 12 )k Cn−2
(4.65)
(4.66)
Combining the bound in (4.66) with that in (4.64), yields
k
k
Cnk ≤ k 1 − ( 12 )k (1 − 2p)Cn−1
+ 1 − ( 12 )k
+ k 2 ( 21 )k (1 − 2p)2 Cn−2
n
kn − k
k −k
2 1 k
1 k
+ 1 + k ( 2 ) 2p(1 − p) 2
+1
+ k 1 − (2) p 2
k −1
k −1
(4.67)
Inequality (4.67) can be solved to yield a closed form upper bound on the centralized sum-rate for large trees.
4.4
Clique Topologies
In addition, we consider fully-connected mesh networks (i.e. clique topologies), in
which each pair of nodes is connected. Compared to tree networks in Section 4.3,
148
mesh networks have a much smaller diameter, resulting in the centralized approach
having access to fresher CSI.
4.4.1
Centralized Scheduling
Consider a fully-connected network where the channel state at each link is independent and identically distributed according to the Markov chain in Figure 4-2. In this
network, an arbitrary node is chosen as the controller; the choice of controller does
not affect throughput due to the network symmetry. In an N -node mesh, the controller is connected to each other node, such that the controller has full information
on N − 1 links, and one-hop delayed information for the other
(n−1)(n−2)
2
links.
The average sum-rate attainable by a centralized controller is upper bounded by
assuming there exists a maximum cardinality matching consisting of ON links (links
with belief greater than the steady state probability). The probability of this event
occurring increases with the size of the network; consequently, this bound becomes
tight as the network size increases. If the controller finds such a matching, the expected sum-rate is given by
CnU B
n−2
=1+
(1 − q),
2
(4.68)
where q is the transition probability from ON to OFF, and b n−2
c is the size of the
2
maximum cardinality matching in the graph that remains after a link emanating from
the controller has been included in the matching.
4.4.2
Distributed Scheduling
Next, we apply the distributed scheme to a clique topology. The distributed algorithm
operates as follows: a node transmits over a randomly chosen ON neighboring link, if
one exists, and otherwise does not transmit. Then the next node repeats this process,
only considering ON links which do not interfere with any previously scheduled links.
The average achievable sum-rate of this algorithm is computed recursively as follows. The first node to transmit has a probability 1−(1−π)n−1 of having an adjacent
149
link in the ON state, where π is the steady state probability defined in (4.5). If there
exists an ON link, the two nodes adjacent to that link cannot activate any other
links, so the next node schedules over an n − 2 node clique. On the other hand, if no
neighboring links are ON, then no links are activated, and the next node schedules
over an n − 1 node clique. The sum-rate is lower bounded by assuming that the next
node to transmit always schedules over an n − 2 node clique, regardless of whether or
not an ON link was found. This technique restricts the space of potential matchings
which can be activated, and thus results in a lower bound on expected throughput.
Dn = (1 − (1 − π)n−1 )(1 + Dn−2 ) + (1 − π)n−1 Dn−1
≥ (1 − (1 − π)n−1 ) + Dn−2
(4.69)
(4.70)
Equation (4.70) yields a recursion which is solved to lower bound the average
sum-rate of the distributed priority scheduler.
1
π (1 − π)n+1 π(3 − 2π)(−1)n 1
2π 2 + π + 2
Dn ≥ π(−1)n + +
−
+ (n + 1) −
(4.71)
2
2
π(2 − π)
8 − 4π
2
4π
In the case where p = q (π = 21 ), this equation simplifies to
n
3 2 1
1
n
n
Dn ≥ (−1) − +
+ .
12
4 3 2
2
(4.72)
As n increases, the expected fraction of nodes with an ON neighboring link tends
to 1, implying that this bound is also asymptotically tight.
4.4.3
Comparison
The bounds in (4.72) and (4.68) combine to give a bound on p∗ , the value of the
transition probability (for a symmetric channel) after which there exists a distributed
policy that performs at least as well as the optimal centralized policy. For n even,
the bound is given by
150
p∗ ≤
4
(1
3
− ( 12 )n )
.
n−2
(4.73)
Similarly, for odd values of n, combining (4.68) and (4.72) yields
∗
p ≤
4 1
(
3 2
− ( 12 )n )
.
n−3
(4.74)
Clearly, as n grows large, the distributed algorithm outperforms the optimal centralized scheduler for a wider range of channel transition probabilities p, since the
upper bound goes to 0.
4.5
Simulation Results
In this section, the performance of the distributed policy is compared to the performance of a centralized controller through simulation. For the centralized case, a
controller is chosen (off-line), and a maximum weighted matching is computed over
the network, where the weight of each link is equal to the belief of that link. This is
compared to a distributed approach in which priorities are assigned in reverse order
of degree. For each network, we simulate decisions over 100,000 time slots. Each
simulation assumes a symmetric channel state model (p = q).
To begin, consider the six node network in Figure 4-14, where the centralized
controller is located at node zero. The average sum-rate throughput as a fraction of
the perfect-CSI throughput5 is plotted as a function of the channel state transition
probability p in Figure 4-15. In Figure 4-17, the simulation is applied to a five-byfive grid network of Figure 4-16, where the centralized controller is located at the
central-most node. Lastly, the simulation is applied to a 10-node, fully connected
mesh network in Figure 4-18.
These results show that for p small, modeling channels with high degrees of memory, a purely centralized controller is optimal. As p increases, eventually the distributed scheme outperforms the centralized scheme in each case. In Figure 4-18, we
5
This is the throughput attainable by a centralized scheduler with perfect CSI.
151
1
2
0
5
3
4
Figure 4-14: A six-node sample network
see that at p ≥ .16, the distributed algorithm outperforms the centralized algorithm.
Recall the bound on p∗ found in (4.73) for cliques shows that the theoretical bound on
p∗ is p∗ ≤ .1665. In this case, the theoretical bound agrees closely with the observed
simulation results. Additionally, comparing the results for the 5x5 grid in Figure 4-17
with the clique in Figure 4-18, it is evident that the threshold p∗ is higher in the
clique. This is because the information available to the centralized scheduler is lessdelayed in the clique than in the grid, where the diameter is larger. This illustrates
the effect of the topology on the resulting performance of each scheduling approach.
6 Node Network
Fraction of Perfect−CSI Throughput
1
0.98
0.96
Distributed
0.94
0.92
0.9
0.88
0.86
Centralized
0.84
0.82
0.8
0
0.05
0.1
0.15
0.2
0.25
p
0.3
0.35
0.4
0.45
0.5
Figure 4-15: Results for the six node network in Figure 4-14, over a horizon of 100,000 time
slots. The plot shows the fraction of the perfect-CSI throughput obtained as a function of
p, the transition probability of the channel state.
152
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Figure 4-16: A 5x5 grid network
5x5 Grid Network
Fraction of Perfect−CSI Throughput
1
0.95
Distributed
0.9
0.85
0.8
0.75
Centralized
0.7
0.65
0
0.05
0.1
0.15
0.2
0.25
p
0.3
0.35
0.4
0.45
0.5
Figure 4-17: Results for a 5 x 5 grid network, over a horizon of 100,000 time slots. The plot
shows the fraction of the perfect-CSI throughput obtained as a function of p, the transition
probability of the channel state.
153
10 Node Mesh
Fraction of Perfect−CSI Throughput
1
0.95
Distributed
0.9
0.85
0.8
0.75
Centralized
0.7
0.65
0
0.05
0.1
0.15
0.2
0.25
p
0.3
0.35
0.4
0.45
0.5
Figure 4-18: Results for 10-node clique topology, over a horizon of 100,000 time slots. The
plot shows the fraction of the perfect-CSI throughput obtained as a function of p, the
transition probability of the channel state.
4.6
Partially Distributed Scheduling
Up to this point, this chapter has compared the performance of distributed scheduling
with the performance of the optimal centralized schedule using delayed CSI. For large
networks, the delayed CSI causes a reduction in the throughput of the centralized
scheduler, as the links far from the controller have channel states largely independent
of the available CSI at the controller. A distributed scheme is shown to outperform
the centralized scheme in these scenarios; however, distributed policies suffer from the
inability to compute a globally optimal schedule. An alternate to fully distributed
scheduling is a partially distributed scheme, in which multiple controllers are used
to schedule the links in local neighborhoods. In this section, we consider applying a
partially distributed scheduling scheme to a binary tree, and show that this scheme
outperforms both the fully centralized or distributed approaches.
Consider an infinitely-deep binary tree. A single centralized controller has no
information pertaining to the majority of the network, and at most attains an average
per-link throughput of
1
,
6
as shown in (4.60). We have shown that a distributed
scheme outperforms the centralized scheme in this scenario. Now, we consider a
154
partially distributed scheme to retain some of the benefits of centralized scheduling.
(a) Subtree of depth 2
(b) Subtree of depth 3
Figure 4-19: Example subtrees from tree-partitioning algorithm
B
B
B
B
B
B
B
B B B B B B B B B B B B B B B B
(a) Partitioning into subtrees of
depth 1
B B B B
B B
B
B B
B B B B
(b) Partitioning into subtrees of
depth 2.
Figure 4-20: Example partitioning of infinite tree (only first four level’s shown). Dashed
links, dotted links, and solid links each belong to different subtrees. The solid nodes represent controllers, which are located at the root of each subtree. Nodes labeled with B are
border nodes.
The full binary tree is partitioned into subtrees of depth k, such that each non-leaf
node in the subtree has degree 3. Subtrees of depth 2 and 3 are shown in Figure 4-19,
and an example partitioning is shown in Figure 4-20. Observe that there exists a
partitioning with subtrees of any depth. Each node in the original binary tree either
belongs to one subtree, or three subtrees. Define a border node to be a node which
belongs to three subtrees, as illustrated by the nodes labeled B in Figure 4-20.
After the binary tree is partitioned, a controller is placed in each partition such
that the resulting rooted subtree has the desired depth. Each controller computes a
schedule for its partition, using delayed CSI pertaining to the links in the subtree.
In order to eliminate inter-subtree interference, multiple controllers cannot activate
links adjacent to the same border node simultaneously. Consider an algorithm which
155
L
R
U
L
U
UU
U
U
R
L
UU
R
UU
U
UU
U
Figure 4-21: Illustration of border link labeling scheme
disables a set of links, such that a disabled link cannot be activated. We propose a
link disabling algorithm with the result that different control regions cannot interfere
with one another. Note that this link-disabling scheme is inspired by the work in [58].
Theorem 19. It is sufficient to disable one link per subtree to completely eliminate
inter-subtree interference.
Proof. To begin, note that inter-subtree interference only occurs at border nodes.
Furthermore, each border node has degree three, and each link adjacent to the border
node belongs to a different subtree. Based on the visualization of the tree in Figure
4-21, the three adjacent links at each border node are labeled as either U , L, or R,
denoting whether the link is the upmost link, the left link, or the right link incident
to the node. In each subtree, all leaves are border nodes, and a subtree of depth k
will have 3 · 2k−1 leaves. Based on the partitioning scheme, one of the leaf links in
each subtree is an L link or an R link, and the remainder of the leaf links will be U
links, as illustrated in Figure 4-21.
Consider the policy which disables all links labeled L or R. Each border node
now has only one adjacent enabled link (the link labeled U ), and thus interference is
removed between subtrees. Furthermore, since each subtree only has one L or one R
leaf link, only one link is disabled per subtree.
The above scheme for inter-subtree contention resolution disables one link per
subtree, leading to a loss in throughput. As the size of the subtree grows, this
156
loss becomes negligible. Figure 4-22 shows the per-link throughput as a function
of the state transition probability p for various subtree sizes. For small values of
p, using subtrees of a larger depth yields higher throughput, as the delayed CSI is
useful. As p increases and delayed CSI becomes less valuable, it becomes optimal
to use less information and add more controllers. Note, a partitioning with subtrees
of depth 1 is fully distributed in the sense of this paper, as controllers use only
local information with which to make scheduling decisions. This plot illustrates a
region in which partially distributed scheduling outperforms both fully centralized and
fully distributed solutions. Intuitively, by dividing the network into control regions,
centralized scheduling is used in each region, and distributed scheduling is used across
regions, providing a trade-off between using delayed CSI and making local decisions.
Figure 4-22: Per-link throughput of the tree partitioning scheme, plotted as a function of
transition probability p for various subtree depths.
157
4.7
Conclusion
In this chapter, we studied the effect of using delayed channel state information (CSI)
on the throughput of wireless scheduling. We showed that a centralized scheduling
approach, while optimal with perfect CSI, suffers from having delayed CSI. Consequently, we show that for rapidly-varying channels, distributed scheduling outperforms centralized scheduling. Similarly, as networks grow larger, distributed approaches become optimal.
Since centralized policies are constrained to using delayed CSI, the location of the
controller has an effect on the throughput performance of the scheduling algorithm.
The choice of controller location corresponds to a choice of which information is
accurate, and which information is delayed. Thus, controllers should be placed in
locations that are central, in terms of both high degree nodes, and the hop-based
center of the network so that more information is available with minimal delay. The
problem of controller placement is addressed in Chapter 5.
158
Chapter
5
Controller Placement for Maximum
Throughput
In the previous chapter, we established that channel state information (CSI) delays
are inherent to centralized wireless scheduling. In deploying such a scheme, one node
is assigned the role of a controller, and collects CSI from the rest of the network.
Then, the controller uses this CSI to select a set of nodes to transmit in each slot,
in order to maximize throughput and avoid interference between neighboring links.
As discussed in Chapter 4, CSI updates from distant links arrive at the controller
after a delay that grows with the distance of the links from the controller. Since CSI
delay reduces the throughput of the scheduling algorithm [69], the placement of the
controller directly impacts network performance.
The aim of this chapter is to study the impact of the controller placement on
network performance. In Section 5.1, we analyze the static controller placement
problem, in which the controller placement is computed off-line, and remains fixed
over time. We provide an optimal formulation, which can be solved numerically for
smaller networks, and a heuristic algorithm to compute a near-optimal controller
placement for large networks.
For a static controller placement, links near the controller achieve a high throughput, while links further away from the controller attain a lower throughput, due to
the CSI delay at the controller. In order to mitigate this imbalance, the second half
159
p
1−p
0
1
1−q
q
Figure 5-1: Markov Chain describing the channel state evolution of each independent channel. State 0 corresponds to an OFF channel, while state 1 corresponds to an ON channel.
of this chapter investigates dynamic controller placement schemes, which change the
location of the controller over time. This allows for the controller to be moved to a
region of the network with high backlogs to increase throughput to this region and
provide stability. In Section 5.2, we propose a dynamic controller placement framework, where the controller is repositioned on-line. Since at any time, each node has
a different view of the network state due to the distance-based CSI delays, the controller placement algorithm must only depend on information shared by all nodes,
such that no additional communication overhead is required. First, we propose a
queue-length based controller placement algorithm, and show that this algorithm offers an increased throughput over a static placement. We propose a joint controller
placement and scheduling algorithm which is shown to be throughput optimal over
the considered policy-space. Second, we consider policies which use delayed CSI as
well as delayed queue length information (QLI), and find a throughput optimal policy
over this policy-space, while characterizing the improvement obtained in using this
extra CSI.
5.1
Static Controller Placement
In this section, we consider an off-line controller placement, such that the controller
remains fixed over time. We show that the optimal controller placement depends on
the network topology as well as the channel transition probabilities.
160
5.1.1
System Model
Consider a network G(N , L) consisting of sets of nodes N and links L. Each link
is associated with an independent, time-varying channel, which is either ON state or
OFF. Let Sl (t) ∈ {OFF, ON} be the channel state of the channel at link l at time t.
Assume the channel state evolves over time according to the Markov chain in Figure
5-1. One of the nodes is assigned to be the controller, and in each time slot, activates
a subset of links for transmission. Assume a primary interference constraint in which
a link activation is feasible if the activation is a matching, i.e. no two neighboring
links are activated. If link l is activated, and Sl (t) = ON, then a packet is successfully
transmitted at that time slot. On the other hand, if the channel at link l is OFF,
then the transmission fails. The objective of the controller is to activate the set of
links resulting in maximum expected sum-rate throughput.
In order to determine the correct subset of links to activate, the controller obtains
channel state information (CSI) from each link in the network, and uses the CSI to
compute a feasible link activation with maximum expected throughput. The scheduling problem for a fixed controller was presented in Chapter 4. Due to the physical
distance between network nodes, and the propagation delay across each link, the CSI
updates received at the controller are delayed proportional to the distance between
each link and the controller. In particular, let let di (j) be the (symmetric) distance in
hops between node i and node j. At time t, each node i has delayed CSI pertaining
to node j from time-slot t − dj (i). In other words, node i has CSI Si (t − di (j)) for
node j.
5.1.2
Controller Placement Example
To begin, consider the example topology in Figure 5-2, and compare the expected
throughput attainable by placing the controller at node A, node B, or node C. Placing
the controller at node A yields the same expected throughput as placing the controller
at node C, due to the symmetry of the network. Consider a generalization of the
network in Figure 5-2, where A and C have degree k + 1. For simplicity, assume a
161
A
B
C
Figure 5-2: Barbell Network
symmetric Markov state in Figure 5-1, i.e. p = q. Let γ = ( 12 )k , the probability that
k links are OFF. Placing the controller at node B results in an expected throughput
of
thptB =
1
1
· 2 (1 − γ)p111 + γp101 + 1 + (1 − γ)p111 + γp101
4
2
1
2 1
+ 1 + (1 − γ )p11 + γ 2 p101 .
4
(5.1)
The above expression follows from conditioning on the state of the two adjacent links
to node B. The first term corresponds to the expected throughput when both links are
OFF, the second corresponds to the case when one is ON and the other is OFF, and
the last term corresponds to both links being ON. Similarly, the expected throughput
from a controller at node A is derived by conditioning on the state of the k + 1 links
adjacent to node A.
1 1
1
2
2
thptA = (1 − γ) 1 + p11 + (1 − γ)p11 + γp01
2
2
1 1 1
1
1
2
2
2
2
1 + (1 − γ)p11 + γp01 +
p +
(1 − γ)p11 + γp01
+γ
2
2 2 11 2
(5.2)
Consider the throughput obtained from a controller at A and B in the limit as k
grows to infinity, in which case γ = 0.
lim thptA = 1 + 21 (1 − p) + 12 p211
k→∞
lim thptB =
k→∞
3
4
+ 54 (1 − p).
162
(5.3)
(5.4)
For p ≤ 14 , it is optimal to place the controller at node B in the center, and for p ≥ 41 , it
is optimal to place the controller at either node A or C. This example highlights some
important properties of the controller placement problem. In particular, it is clear the
optimal placement depends on the channel transition probabilities. When p is small,
it is advantageous to place the controller to minimize the CSI delay throughout the
network. On the other hand, when p is close to 21 , the CSI is no longer useful and
it is better to maximize the degree of the controller, since the controller always has
perfect information of its neighboring links.
5.1.3
Optimal Controller Placement
From the previous example, it is clear that the throughput-maximizing controller
placement is a function of the channel state transition probabilities p and q, as well as
the network topology. In this section, we present a mathematical formulation for the
optimal controller location. Let M be the set of matchings in the network, i.e., ∀M ∈
M, M is a set of links which can be scheduled simultaneously without interfering
with one another. Under a throughput maximization objective, the selected controller
schedules the matching that maximizes expected sum-rate throughput with respect to
the CSI delays at that node. Consequently, the controller placement can be optimized
as follows.
c = arg max ES
X max
E Sl (t) Sl (t − dr (l)) = Sl
M ∈M
r
= arg max ES max
M ∈M
r
= arg max ES max
r
M ∈M
(5.5)
l∈M
X
d (i)
pSrl ,1
(5.6)
l∈M
X
dr (l)
π + (Sl − π)(1 − p − q)
(5.7)
l∈M
Equation (5.6) follows since the channel state satisfies Sl (t) ∈ {0, 1}. Equation
(5.7) follows from using the definition of the k-step transition probability of the channel state Markov chain. Computing a maximum matching requires solving an integer
163
linear program (ILP) and it is known to be solvable in O(|L|3 )-time [55]. However,
computing the optimal controller position in (5.7) requires computing the expectation
of the maximum matching, which necessitates solving the ILP for every state sequence
S(t) ∈ {0, 1}|L| . Thus, the computational complexity of the controller placement
problem is exponential in the number of links, and this computation is intractable for
large networks.
5.1.4
Effect of Controller Placement
C
k2
B
k1
A
Figure 5-3: (Snowflake Network) Symmetric network in which node A has degree k1 and
node B has degree k2 + 1
Since the computation of the optimal controller placement is difficult, it is important to quantify the sensitivity of the expected throughput to the location of the
controller. Consider the snowflake network in Figure 5-3. Due to the symmetry of the
network, there are three potential controller locations, labeled as nodes A, B, and C.
The optimal controller placement is computed by solving (5.7), and the corresponding
expected sum-rate throughput attainable from a controller at each location is shown
in Figure 5-4 for k1 = 4 and k2 = 20. Placing the controller at node A results in the
maximal sum rate for the majority of channel transition probabilities, except when
the transition probability is close to 21 , at which point, node B becomes the optimal
location.
Figure 5-4 shows that in some operating regimes, placing the controller at the
wrong location results in significant reduction in expected throughput. In particular,
164
Figure 5-4: Sum-rate throughput resulting from having controller at three possible node
locations, with k1 = 4 and k2 = 20, as a function of channel transition probability p = q.
for a symmetric channel state model (p = q), if k2 = 2 ∗ k1 , and k1 grows large,
the throughput attainable from a controller at node A is 1 + (k1 − 1)(1 − p) and
the throughput attainable from a controller at node B is 1 + (1 − p) + (k1 − 2)p211 .
Therefore, as k1 grows large, placing the controller at node A offers up to a 20%
gain in throughput over placing the node at B, even though B has a higher degree.
Furthermore, placing the controller at node A offers up to a 33% gain over placing
the controller at node C. Clearly, computing the optimal controller location has
a significant impact on the throughput performance of the network, and a simple
largest-degree controller placement heuristic is insufficient. Note that placing the
controller at node B, the high-degree node, is optimal when p approaches
1
2
and the
CSI becomes useless for all links but those adjacent to the controller.
5.1.5
Controller Placement Heuristic
In Section 5.1.3, a mathematical formulation for computing the optimal controller
location in a network was presented, which depends on the distance between each
node, as well as the channel state statistics. However, this computation has a complexity that grows exponentially with the size of the network. Section 5.1.4 shows
165
that an accurate controller placement heuristic is required to prevent a significant
loss in throughput. In this section, we propose a computationally tractable heuristic
for computing the optimal controller location, which is shown to be near-optimal in
terms of the resulting expected throughput.
Consider the following heuristic for placing the controller. Each node is assigned
a weight based on its degree. As the memory in the channel process decreases, the
best controller location is the node most likely to have an ON neighboring link, i.e.
the node with the highest degree. To model this, node n is assigned a weight of
(1 − (1 − π)∆n ), where ∆n is the degree of node n, which is equal to the probability
of having an adjacent ON link.
The controller is placed at the location maximizing the information about the
network. Intuitively, the controller should be “close” to as many highly weighted
nodes as possible. However, “closeness” must reflect the memory in the system. Thus,
each node computes a function of the distance to each other node, (1 − p − q)di (n) , and
maximizes the weighted sum-distance to each node as shown in (5.8). In summary,
the controller is placed according to:
c = arg max
r
X
(1 − p − q)dr (n) (1 − (1 − π)∆n ).
(5.8)
n∈N
Placing the controller according to (5.8) preserves the important properties of the
optimal controller placement in (5.7).
The heuristic in (5.8) is very similar to the well-known p-median problem [16], for
p = 1. The 1-median problem seeks to find the node that minimizes the sum distance
to all other nodes. In contrast, the controller placement assigns weights to nodes and
uses a convex function of distance in this computation. These differences ensure that
the controller is placed at the location that yields high throughput, which may not
be the same as the solution to the 1-median problem.
Consider the barbell network in Figure 5-2.
Figure 5-5 shows the expected
throughput for a controller at node A and a controller at node B, as well as the
value of the heuristic objective in (5.8). These results show the controller placement
166
(a) Expected Throughput
(b) Heuristic Weight
Figure 5-5: Evaluation of the controller placement heuristic for the barbell network and
various channel transition probabilities p = q.
in (5.8) is similar in terms of throughput to the optimal placement. When the heuristic offers a different controller placement, the difference from the throughput obtained
from the optimal placement is small.
In general, the heuristic returns a controller location that yields throughput close
to that of the optimal placement. Consider the NSFNET topology in Figure 56. For this topology, the heuristic of (5.8) is applied and compared to the optimal
controller placement, as shown in Table 5.1. Often, the heuristic-optimal controller
placement is the same as the throughput-optimal controller placement. Furthermore,
in instances where the throughput-optimal location differs from the heuristic location,
the controller is placed at a location yielding an average throughput within 1% of
optimal.
5.1.6
Multiple Controllers
In Section 4.6, a partially distributed scheduling scheme, in which the network is
partitioned into sub-networks and multiple controllers are used to control each partition independently, is shown to outperform both fully centralized and distributed
scheduling in certain operating regimes. Formulating the optimal k-controller place167
1
11
8
13
0
3
4
10
5
7
12
2
9
6
Figure 5-6: 14 Node NSFNET backbone network (1991)
Strategy
p = 0.05
p = 0.1
p = 0.15
p = 0.2
p = 0.25
p = 0.3
p = 0.35
p = 0.4
p = 0.45
Optimal Placement Heuristic Placement % Error
6
6
0
6
6
0
6
6
0
6
6
0
6
6
0
6
6
0
10
6
.0289
10
6
.2974
10
6
.5704
Table 5.1: Results of controller placement problem over the NSFNET topology. Optimal
placement is computed by solving (5.7) via brute force, while heuristic refers to (5.8).
ment problem is difficult due to the necessity of resolving conflicts on the boundary
of the control regions. Despite this challenge, the heuristic in Section 5.1.5 can be
extended to multiple controllers. This extension is analogous to the extension of the
1-median problem to the p-median problem.
Let r = (r1 , . . . , rk ) ∈ N k be a vector of locations for the k controllers. The
k-controller placement heuristic is formulated as
c = arg max
r∈N k
X
(1 − p − q)mini dri (n) (1 − (1 − π)∆n ),
(5.9)
n∈N
The k-controller heuristic is similar to the 1-controller heuristic in (5.8), with the
modification that nodes are weighted by a function of the distance to the closest
168
controller. Assigning each node to the closest controller maximizes the expression in
(5.9), and yields the highest expected throughput since the controller closest to a link
has the most accurate CSI pertaining to that link.
The optimization in (5.9) involves iterating through each combination of k nodes,
the complexity of which grows as Nk . Therefore, we propose a low-complexity heuristic to place the k controllers. To begin, consider the Myopic Controller Placement
algorithm, which places each controller sequentially, assuming the previously placed
controllers have been placed optimally.
Algorithm 1 Myopic Controller Placement
1:
2:
3:
Given C0 = {};
for j = 1 → k do
cj = arg max
r∈N
4:
5:
Cj = Cj−1
end for
S
X
(1 − p − q)
mini∈Cj−1 S{r} dri (n)
(1 − (1 − π)∆n ),
(5.10)
n∈N
{cj };
At each iteration, the myopic controller placement algorithm finds the location
for a new controller such that each node is controlled by either a controller in Cj−1 ,
or the new controller. After executing the myopic controller placement algorithm,
Ck is a feasible location of controllers, but is potentially suboptimal. To improve
the quality of this solution, the controller exchange algorithm is used to refine the
solution. A similar algorithm is used as a heuristic approximation to the p-median
problem in [64].
The controller exchange algorithm refines the selection of the controllers by selecting an element r ∈ Ck at random, and searching to see if there exists a node
to replace r as a controller, that results in a higher throughput. The controller exchange algorithm circumvents the local optima resulting from the myopic placement
algorithm.
To verify the performance of these heuristics, each algorithm is run over various
random geometric graphs. A random geometric graph (RGG) with N nodes and
connectivity radius R is a random graph in which N nodes are randomly placed in
169
Algorithm 2 Controller Exchange Algorithm
Input: Ck : A set of k controller locations;
1: while 1 do
2:
C0 = Ck
3:
Generate random partition x of Ck
4:
for r ∈ x do
5:
C 0 = Ck \ r
6:
X
S
c = arg max
(1 − p − q)mini∈C 0 {r} dri (n) (1 − (1 − π)∆n ),
r∈N
7:
8:
9:
10:
11:
12:
13:
14:
15:
(5.11)
n∈N
if c 6= r then
S
Ck = C 0 {r}
Break;
end if
end for
if Ck = C0 then
Break;
end if
end while
the unit square, and two points are connected if the Euclidean distance between them
is less than R [52]. Numerous random graphs of 20, 30, and 40 nodes are generated,
the myopic placement policy and the controller exchange policy are applied to these
RGG’s, and each of these algorithms is compared with the solution obtained in (5.9).
The results of this experiment are presented in Table 5.2. The myopic policy is shown
to return a weight close to that of the optimal solution, and the exchange algorithm
offers further improvement. In many instances, the output of the exchange algorithm
is in fact the same as the controller placement of (5.9). Figure 5-7 gives sample
controller placements over RGGs, showing that the controllers are placed at highly
central nodes, while providing good information coverage throughout the network.
5.2
Dynamic Controller Placement
For a fixed controller location, the links physically close to the controller operate at
a higher throughput than those far from the controller due to the delay in CSI. By
relocating the controller, the throughput in different regions of the network can be
170
(a) 3-controllers
(b) 4-controllers.
Figure 5-7: Random geometric graph with multiple controllers placed using the myopic
placement algorithm, followed by the controller exchange algorithm. Link colors correspond
to distance from the nearest controller.
171
# Controllers
2 Controllers
3 Controllers
4 Controllers
Myopic Exchange Optimal
91.46
94.241
94.479
109.68
111.51
111.984
122.04
122.72
122.82
# Controllers
2 Controllers
3 Controllers
4 Controllers
5 Controllers
Myopic Exchange Optimal
144.62
149.53
149.55
166.88
168.88
169.03
181.01
181.6
181.6
193.34
193.505
193.515
(a) Experiment 1: 30 nodes, Connectivity ra(b) Experiment 2: 20 nodes, Connectivity radius R = 0.275. 10 Iterations. p = q = 0.3.
dius R = 0.35. 20 Iterations. p = q = 0.3.
# Controllers
2 Controllers
3 Controllers
Myopic Exchange Optimal
111.94
113.76
113.76
136.82
138.66
138.66
(c) Experiment 3: 40 nodes, Connectivity radius R = 0.25. 10 Iterations. p = q = 0.3.
Table 5.2: Maximum weight for different controller placement algorithms over random
geometric graphs.
balanced. In this section, we consider policies which recompute the controller location
dynamically in order to balance the throughput throughout the network.
Q1 (t) S (t)
1
R1
Q2 (t) S (t)
2
R2
QM (t) S (t)
M
RM
λ1
λ2
BS
λM
Figure 5-8: Wireless Downlink
For simplicity of exposition, consider a system of M nodes operating under an
interference constraint such that only one node can transmit at any time, as in Fig
5-8. Packets arrive externally to each node i according to an i.i.d. Bernoulli arrival
process Ai (t) of rate λi , and are stored in a queue at that node to await transmission.
Let Qi (t) be the packet backlog of node i at time t. Each node has access to an
independent time-varying ON/OFF channel as in Figure 5-1. If a node is scheduled
for transmission, has a packet to transmit, and has an ON channel, then a packet
departs the system from node i.
172
The above network model applies directly to a wireless downlink or uplink; however, it can easily be extended to a network setting. First, instead of the controller
selecting one node to transmit, a set of non-interfering nodes is scheduled to transmit.
The extension involves changing the scheduling optimization to be over all matchings
in the network, rather than all individual nodes. Second, in a network, packets are
required to traverse multiple hops on route to their destinations. This extension requires a modification to the throughput optimal policy of Theorem 21, analogous to
the approach taken in [48].
In addition to each node i having delayed CSI pertaining to node j from the dj (i)
time-slots in the past, it has delayed queue length information (QLI) as well. In other
words, node i has delayed CSI Si (t − di (j)) and delayed QLI, Qi (t − di (j)) for each
other node j. Let S(t−dr ) represent the vector of delayed CSI pertaining to controller
r, i.e. S(t − dr ) = {Si (t − dr (i))}i . Let dmax = maxi,j dj (i), i.e. dmax is the network
diameter.
As described previously, one node is assigned the role of the controller. The
controller uses delayed CSI and QLI to determine a schedule. Every N time-slots, the
location of the controller is recomputed. In order to do this computation, each node
must be able to compute the controller at the current slot without communicating
with the other nodes. Therefore, the controller selection algorithm must only depend
on globally available information. In particular, we consider algorithms that are based
only on sufficiently delayed QLI, and do not consider CSI on deciding where to place
the controller.
Since CSI and QLI are available at each node with different delays, additional
delays are introduced to ensure that each node has the same view of the network
state for controller placement. In Section 5.2.2, we consider controller placement
policies using only delayed QLI, since it is known that delayed QLI does not affect
the throughput performance of the system [41]. In Section 5.2.3, this is extended to
policies which also use homogeneously delayed CSI for controller placement, as older
CSI might also be available to each node, and can be used to increase the throughput
region.
173
The primary objective of this work is to determine a joint controller placement
and scheduling policy to stabilize the system of queues. We now provide a definition
of stability.
Definition A queue with backlog Qi (t) is stable under policy π if
n−1
1X
lim sup
E[Qi (t)] < ∞
n→∞ n
t=0
(5.12)
The complete network is stable if all queues are stable.
Definition The throughput region Λ is the closure of the set of all rate vectors λ that
can be stably supported over the network by a policy π ∈ Π.
Lastly, we define a throughput optimal policy as follows
Definition A policy is said to be throughput optimal if it stabilizes the system for
any arrival rate λ ∈ Λ.
In this work, we characterize the throughput region of the controller placement and
scheduling problem above, and propose a throughput optimal controller placement
and scheduling policy based on the information available at each node.
5.2.1
Two-Node Example
λ1
λ2
c1
c2
Controller Selection
Figure 5-9: Example 2-node system model.
To illustrate the effect of dynamic controller relocation, consider a two-node system, as in Figure 5-9. Each node has instantaneous CSI pertaining to its channel at
174
the current time, and 1-step delayed CSI of the other channel. Let Λ1 be the throughput region when the controller is fixed at node 1, and let Λ2 be the throughput region
when the controller is fixed at node 2. The throughput regions Λr are computed for
each r by solving the following linear program (LP).
Maximize:
Subject To:
X
λi + ≤
P(S(t − dr ) = (s1 , s2 ))αi (s1 , s2 )E[Si (t)|Si (t − dr (i)) = si ]
(5.13)
(s1 ,s2 )∈S
∀i ∈ {1, 2}
M
X
αi (s1 , s2 ) ≤ 1
∀s ∈ S
i=1
αi (s1 , s2 ) ≥ 0
∀s ∈ S, i ∈ 1, 2
In the above LP, α(s1 , s2 ) represents the fraction of time link i is scheduled when
delayed CSI at the controller is (s1 , s2 ). To maintain stable queue lengths, the arrival
rate to each queue must be less than the service rate at that queue, which is dictated
by the fraction of time the node transmits, and the expected throughput obtained
over that link. For the case when the controller is at node r, Λr is the set of arrival
rate pairs λ = (λ1 , λ2 ) such that there exists a solution to (5.13) satisfying ∗ > 0.
This corresponds to the existence of a scheduling policy which transmits over link i
with probability αi (s1 , s2 ) when the state of the channels are s1 and s2 respectively.
The proof that Λr is in fact the stability region of the system is found in [69].
The throughput regions Λ1 and Λ2 are plotted in Figure 5-10 for the case when
p = q = 0.1. The throughput region is larger in the dimension of the controller, as
a higher throughput is obtained at the node for which current CSI is available. The
other node cannot attain the same throughput due to the CSI delay at the controller.
Now consider a time-sharing policy, alternating between placing the controller at node
1 and placing the controller at node 2. The resulting throughput region Λ is given by
175
Throughput region for different controllers: p = 0.1, q = 0.1
Perfect CSI
Controller at 2
Controller at 1
0.5
2
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
1
Figure 5-10: Throughput regions for different controller scenarios. Assume the channel
state model satisfies p = 0.1, q = 0.1, and d1 (2) = d2 (1) = 1.
the convex hull of Λ1 and Λ2 , which is shown as the dotted black line in Figure 5-10.
Time-sharing between controller placements allows for higher throughputs than if the
controller is fixed at either node. For example, the point (λ1 , λ2 ) = ( 38 − , 38 − ), for
small, is not attainable by any fixed controller placement; however, this throughput
point is achieved by an equal time-sharing between controller locations.
The correct time sharing between controller placements depends on the arrival
rate. However, this information is usually unavailable, and we desire a control policy
that stabilizes the system even if the arrival rates change. Thus, we propose a dynamic
controller placement and scheduling policy which achieves the full throughput region
Λ using only delayed QLI for controller placement, and delayed CSI and QLI for
scheduling, with no information pertaining to the arrival rates.
5.2.2
Queue Length-based Dynamic Controller Placement
We consider controller placement policies that depend only on delayed QLI. We assume that CSI in not available for use in placing the controller 1 . Let Π be the
1
For networks with a large diameter, the common CSI may be too stale to be used in controller
placement;thus, we restrict our attention to policies which utilize QLI to make controller placement
decisions, but not CSI.
176
set of all policies which make a controller-placement decision based on QLI and not
CSI, and schedule a node to transmit based on the delayed CSI and QLI at the controller. This section proves that dynamically computing the controller placement as a
function of queue lengths increases the throughput region over policies with fixed controller placements. The throughput region under such policies is evaluated, and the
dynamic controller placement and scheduling (DCPS) policy is proposed and shown
to stabilize the system for all arrival rates within the throughput region.
Throughput Region
Theorem 20 shows that the throughput region is computed by solving the following
LP.
Maximize:
Subject To:
λi + ≤
X
PS (s)
s∈S
M
X
αir (s) ≤ 1
M
X
βr αir (s)E Si (t)Si (t − dr (i))]
∀i ∈ {1, . . . , M }
r=1
∀s ∈ S
i=1
αir (s) ≥ 0
M
X
∀s ∈ S, i, r ∈ 1, . . . , M
βr ≤ 1
r=1
βr ≥ 0
∀s ∈ S, i, r ∈ 1, . . . , M
(5.14)
This LP is an extension of the LP given in (5.13) to M nodes, with the addition
of a time sharing between controller locations. The optimization variables βr and
αir (s) correspond to controller placement and link scheduling policies respectively.
The variables βr represent the fraction of the time that node r is elected to be a
177
controller, and αir (s) is the fraction of time that controller r schedules node i when
the controller observes a delayed CSI of S(t − dr ) = s. Note that PS (s) is the
stationary probability of the Markov chain in Figure 5-1. The throughput region Λ,
is the set of all non-negative arrival rate vectors λ such that there exists a feasible
solution to (5.14) for which ∗ ≥ 0. This implies that there exists a stationary policy
such that the effective service rate at each queue is greater than the arrival rate to
that queue.
Theorem 20 (Throughput Region). For any non-negative arrival rate vector λ, the
system can be stabilized by some policy P ∈ Π if and only if λ ∈ Λ.
Necessity is shown in Lemma 15, and sufficiency is shown in Theorem 21 by
proposing a throughput optimal joint scheduling and controller placement algorithm,
and proving that for all λ ∈ Λ, that policy stabilizes the system.
Lemma 15. Suppose there exists a policy P ∈ Π that stabilizes the network for all
λ ∈ Λ. Then, there exists a βr and αir (s) such that (5.14) has a solution with ∗ ≥ 0.
Proof. Suppose the system is stabilized with some control policy P, consisting of
functions βr (t), which chooses a controller independent of channel state, and αir (t)
which chooses a link activation based on delayed CSI at the controller. Without
loss of generality, let βr (t) be an indicator function signaling whether node r is the
controller at time t, and let αir (t) be an indicator signaling whether link i is scheduled
for transmission at time t. Under any such scheme, the following relationship holds
between arrivals, departures, and backlogs for each queue:
t
X
Ai (τ ) ≤ Qi (t) +
τ =1
t
X
µi (βr (τ ), αir (τ )),
(5.15)
τ =1
where µi is the service rate of the ith queue as a function of the control decisions.
Expanding µi in terms of the decision variables βr (t) and αir (t) yields
t
X
τ =1
Ai (τ ) ≤ Qi (t) +
t X
M
X
βr (τ )αir (τ )E[Si (τ )|Si (τ − dr (i))].
τ =1 r=1
178
(5.16)
Let Tr be the subintervals of [1, t] over which r is the controller. Further, let TSr be the
subintervals of Tr such that the controller r observes delayed CSI S(t − dr (i)) = S.
Let |Tr | and |TSr | be the aggregate length of these intervals. Since the arrival and the
channel state processes are ergodic, and the number of channel states and queues is
finite, there exists a time t1 such that for all t ≥ t1 , the empirical average arrival rates
and state occupancy fractions are within of their expectations.
t
1X
Ai (τ ) ≥ λi − t τ =1
(5.17)
1
|T r | ≤ P(Si (t) = S|r) + = P(Si (t) = S) + |Tr | S
(5.18)
The above equations hold with probability 1 from the strong law of large numbers [8].
Furthermore, since the system is stable under the policy P, [48] shows that there exists
a V such that for an arbitrarily large t,
P
X
M
Qi (t) ≤ V
i=1
1
≥ .
2
Thus, let t be a large time index such that t ≥ t1 and
(5.19)
V
t
≤ . If
PM
i=1
Qi (t) ≤ V ,
the inequality in (5.16) can be rewritten by dividing by t.
t
t
M
1X
1
1 XX
βr (τ )αi (τ )E[Si (τ )|Si (t − dr (τ ))].
Ai (τ ) ≤ V +
t τ =1
t
t τ =1 r=1
t
M
(5.20)
t
X1X
1X
λi − ≤
Ai (τ ) ≤ +
βr (τ )αi (τ )E[Si (τ )|Si (t − dr (τ ))].
t τ =1
t τ =1
r=1
(5.21)
The lower bound in (5.21) follows from (5.17). Since βr (τ ) = 1 if and only if τ ∈ Tr ,
the inequality in (5.21) is equivalent to
λi ≤ 2 +
M
X
1X
r=1
t
αi (τ )E[Si (τ )|Si (τ − dr (i))]
τ ∈Tr
179
(5.22)
M
X
|Tr | 1 X
αi (τ )E[Si (τ )|Si (τ − dr (i))]
= 2 +
t |Tr | τ ∈T
r=1
(5.23)
r
= 2 +
M
X
βr
r=1
1 X
αi (τ )E[Si (τ )|Si (τ − dr (i))]
|Tr | τ ∈T
(5.24)
r
The last equation follows from defining
βr ,
|Tr |
,
t
(5.25)
the empirical fraction of time that r is the controller. Now, break the summation
over Tr into separate summations over the sub-intervals TSr for each observed S. Note
that E[Si (τ )|Si (τ − dr (i))] is the k-step transition probability of the Markov chain in
Figure 5-1 for k = dr (i).
λi ≤ 2 +
M
X
βr
r=1
= 2 +
M
X
= 2 +
βr
≤
r=1
βr
X |T r | 1 X
d (i)
S
αi (τ )pSri ,1
r
|Tr | |TS | τ ∈T r
S∈S
(5.27)
S
βr
r=1
M
X
(5.26)
S
r=1
M
X
X 1 X
d (i)
αi (τ )pSri ,1
|Tr | τ ∈T r
S∈S
X |T r |
S
S∈S
X
|Tr |
d (i)
αri (S)pSri ,1
(5.28)
d (i)
P(Si (t) = S)αri (S)pSri ,1 + (2 + |S|)
(5.29)
S∈S
where (5.28) follows from defining the fraction of time that link i is scheduled given
r and S as
αri (S) ,
1 X
αi (τ ),
|TSr | τ ∈T r
(5.30)
S
and (5.29) follows from (5.18) and the fact that controller placement is independent
P
P
of channel state. Because the control functions satisfy r βr (t) ≤ 1 and i αi (t) ≤ 1,
it follows that βr and αir satisfy those same constraints. Furthermore, the fraction of
time node r is the controller, β r , is independent of the CSI.
180
The above inequality assumes
greater than
1
2
PM
i=1
Qi (t) ≤ V , which holds with probability
by (5.19). Hence, there exists a set of stationary control decisions
βr and αri satisfying the necessary constraints such that (5.29) holds for all i. If there
did not exist such a stationary policy, than this inequality would hold with probability
0. Therefore, λ is arbitrarily close to a point in the region Λ, implying the constraints
imposed by Λ are necessary for stability.
Lemma 15 shows that for all λ ∈ Λ, there exists a stationary policy STAT ∈ Π
that stabilizes the system, by placing the controller at r with probability βr , and
schedules i to transmit when the delayed CSI at controller r is S with probability
αri (S)
Queuing Dynamics
Consider a scheduling and controller placement policy P ∈ Π. Let DiP (t) be the
departure process of queue i, such that DiP (t) = 1 if there is a departure from queue
i at time t under policy P. Consider the evolution of the queues over T time slots,
subject to a scheduling policy P.
+ X
T −1
T −1
X
P
Ai (t + k)
Di (t + k)
+
Qi (t + T ) ≤ Qi (t) −
(5.31)
k=0
k=0
Equation (5.31) is an inequality rather than an equality due to the assumption that
the departures are taken from the backlog at the beginning of the T -slot period, and
the arrivals occur at the end of the T slots. Under this assumption, the packets that
arrive within the T -slot period cannot depart within this period. The square of the
queue backlog is bounded using the inequality in (5.31).
Q2i (t
+ T) ≤
Q2i (t)
+
X
T −1
2 X
2
T −1
P
Ai (t + k) +
Di (t + k)
k=0
k=0
X
T −1
T −1
X
P
+ 2Qi (t)
Ai (t + k) −
Di (t + k)
k=0
181
k=0
(5.32)
The above bound follows using the fact that Ai (t) ≥ 0 and Di (t) ≥ 0. Denote by Y(t)
the relevant system state at time t. Since the CSI is delayed by different amounts of
time depending on the location of the controller, and the controller changes locations
over time, Y(t) is defined to include all possible combinations of delayed CSI, as well
as the complete history of QLI.
Y(t) = S(t − dmax ) . . . S(t), Q(0) . . . Q(t)
(5.33)
This definition insures the system state is Markovian. Note that the system state
Y(t) is not completely available to the controller, since each node has delayed CSI.
Because dmax is the largest delay to CSI in the network, values of S(τ ) for τ < t−dmax
do not affect the evolution of the system.
Due to the ergodicity of the finite state Markov chain controlling the channel state
process, for any δ > 0, there exists an N such that the probability of the channel
state conditioned on the channel state N slots in the past is within δ of the steady
state probability of the Markov chain.
P S(t) = sS(t − N ) − P S(t) = s ≤ δ
(5.34)
Define TSS () is to be a large constant such that when N = TSS (), (5.34) is satisfied
for δ =
,
2|S|
where |S| = 2M . In other words,
P S(t) = sS(t − TSS ()) − P S(t) = s ≤ 2|S|
(5.35)
TSS is related to the time it takes the Markov chain to approach its steady state
distribution.
Dynamic Controller Placement and Scheduling (DCPS) Policy
In this section, we propose the dynamic controller placement and scheduling policy,
and show that this policy stabilizes the network whenever the arrival rate vector is
interior to the capacity region Λ. Additionally, this proves the sufficient condition of
182
Theorem 20. While the problem formulation is such that the controller is repositioned
every N time-slots. In this section we prove throughput optimality for N = 1. The
extension to general N is discussed in Section 5.3.1.
Theorem 21. Consider the dynamic controller placement and scheduling (DCPS)
policy, which operates in two steps. First, choose a controller by solving the following
optimization as a function of the delayed queue backlogs Qi (t − τQ ).
∗
r = arg max
r
X
PS (S(t − dr ) = s) max Qi (t −
i
s∈S
d (i)
τQ )psir,1
(5.36)
where PS (s) is the steady state probability of the channel-state process. Then the
controller uses its observed CSI S(t − dr∗ (i)) = s, and schedules the following queue
to transmit.
d
∗ (i)
i∗ = arg max Qi (t − τQ )psir,1
(5.37)
i
For any arrival rate λ, and > 0 satisfying λ + 1 ∈ Λ, the DCPS policy stabilizes
the system if τQ ≥ dmax + TSS () for TSS () defined in (5.35).
Under policy DCPS, the controller is placed at the node maximizing the expected
max-weight schedule, over all possible states. Then, the controller observes the delayed CSI and schedules the max-weight schedule for transmission as in [48] and [69].
Moving the controller to nodes with high backlog increases the throughput at those
nodes, keeping the system stable.
Theorem 21 is proved in Section 5.5.1. The proof follows using a Lyapunov drift
technique [48], and shows that as the system backlogs grow large, the drift becomes
negative, implying system stability. We consider the Lyapunov drift over a T -slot
window, where T is large enough that the system reaches its steady state distribution.
The throughput optimal controller placement uses delayed QLI Q(t − τQ ). The
delay τQ is sufficiently large such that Q(t − τQ ) is available at every node, i.e. τQ ≥
dmax . Furthermore, we require that τQ ≥ dmax + TSS (), where TSS () is the time
required for the channel state process to approach its steady state distribution (i.e. the
mixing time of the Markov process). Even though QLI is available at much less delay,
183
the controller must use an older version of the QLI for throughput optimality. The
reasoning behind this is related to the fact that long queues are typically located at
nodes with OFF channels; however, if the QLI is sufficiently delayed, it is independent
from the current channel state. This property of the optimal policy is investigated
further in Chapter 6.
Example: Homogeneous Delays
Figure 5-11: Example star network topology where each node measures its own channel
state instantaneously, and has d-step delayed CSI of each other node.
For specific topologies, the throughput optimal controller placement in (5.36) takes
on a simpler form. In particular, this section examines topologies for which each node
is equidistant from all other nodes, as in Figure 5-11.
Corollary 5. Consider a system of M nodes, where only one can transmit at each
time. Assume the controller has full knowledge of its own channel state and d-slot
delayed CSI for each other channel, as in Figure 5-11. At time t, the DCPS policy
places the controller at the node with the largest backlog at time t − τQ .
r∗ = arg max Qr (t − τQ )
(5.38)
r
Corollary 5 is proven by showing that the expression in (5.36) simplifies to (5.38)
under the setting of homogeneous delays, which follows due to the symmetry of the
system. A detailed proof is provided in the Appendix. Note the queue lengths in the
above theorem must still be delayed according to Theorem 21.
184
5.2.3
Controller Placement With Global Delayed CSI
In the previous section, the throughput optimal joint controller placement and scheduling policy is presented with the restriction to policies which use only delayed QLI for
controller placement. The motivation behind this restriction is that old delayed QLI
is available at each node, allowing the controller location to be computed without
communication between nodes. In this vein, the channel state of each node dmax slots
ago is also globally available knowledge, since dmax is the largest CSI delay in the network. If the network has small diameter, or a high degree of memory, the additional
CSI has a significant impact on performance. In this section, we characterize the new
throughput region, and propose an extension to the DCPS policy which stabilizes the
system for all arrival rates within this stability region.
Throughput Region
The new throughput region is computed by solving the following LP.
Maximize:
Subject To:
λi + ≤
X
P(S(t − dmax ) = s)
M
X
βr (s)
r=1
s∈S
X
d (i)
P(S(t − dr (i) = s0 )|S(t − dmax ) = s)αir (s)ps0r,1
s0 ∈S
i
∀i ∈ {1, . . . , M }
M
X
αir (s0 ) ≤ 1
∀s ∈ S
i=1
αir (s) ≥ 0
M
X
βr (s) ≤ 1
∀s ∈ S, i, r ∈ 1, . . . , M
∀s ∈ S
r=1
βr (s) ≥ 0
∀s ∈ S, r ∈ 1, . . . , M
(5.39)
This LP is an extension of (5.14) allowing βr to be a function of S(t − dmax ). The
185
optimization variables βr (s) and αir (s0 ) correspond to controller placement and link
d (i)
scheduling policies respectively. Note that psir,1 is the k-step transition probability
(where k = dr (i)) of the Markov channel state (Figure 5-1). The throughput region,
Λ, is the set of all non-negative arrival rate vectors λ such that there exists a feasible
solution to (5.14) for which ≥ 0.
Theorem 22 (Throughput Region). For any non-negative arrival rate vector λ, the
system is stabilized by some policy P ∈ Π if and only if λ ∈ Λ.
Necessity in Lemma 16 and sufficiency is shown in Theorem 23 by proposing a
throughput optimal joint scheduling and controller placement algorithm, and proving
that for all λ ∈ Λ, that policy stabilizes the system.
Lemma 16. Suppose there exists a policy P ∈ Π that stabilizes the system. Then,
there exists variables βr (s) and αir (s0 ) such that (5.14) has a solution with ≥ 0.
Lemma 16 shows that for all λ ∈ Λ, there exists a stationary policy STAT ∈ Π
that stabilizes the system, by placing the controller at r with probability βr (S) when
the maximally delayed CSI is S(t − dmax ) = S, and schedules i to transmit with
probability αri (S 0 ) when the delayed CSI at controller r is S(t − dr ) = S 0 .
Dynamic Controller Placement and Scheduling (DCPS) Policy
Consider the queueing model of Section 5.2.2, which holds for the case when controller
placement uses delayed CSI and QLI as well. In this section, we extend the dynamic
controller placement and scheduling (DCPS) policy of Section 5.2.2 to utilize delayed
CSI, and show that this policy stabilizes the system for all arrival rates within Λ.
This proves the sufficient condition of Theorem 22.
Theorem 23. Consider the modified DCPS policy, which operates in two steps. First,
choose a controller by solving the following optimization as a function of the delayed
queue backlogs Q(t − τQ ) and delayed CSI S(t − dmax ).
∗
r = arg max
r
X
d (i)
P S(t − dr (i)) = sS(t − dmax ) max Qi (t − τQ )psir,1
i
s∈S
186
(5.40)
The controller observes CSI S(t − dr∗ (i)) = s, and schedules the following queue to
transmit.
d
∗ (i)
i∗ = arg max Qi (t − τQ )psir,1
(5.41)
i
The DCPS policy in (5.40) and (5.41) is throughput optimal if τQ > dmax .
The proof of Theorem 23 is given in the Appendix, and follows according to
the steps of the proof of Theorem 21 with modifications made to the conditioning
throughout the proof.
Under policy DCPS, the controller is placed at the node maximizing the expected
max weight schedule, over all possible states, where this expectation is conditioned
on globally available delayed CSI. Then the controller observes the delayed CSI and
schedules the max-weight schedule for transmission according to [48] and [69]. Note
that for controller placement polices which only use QLI, a very large delay is required
for the DCPS policy, as the channel state must be independent from the queue length
at that time. On the other hand, when the controller placement policy also depends
on the delayed CSI S(t − dmax ), the queue length delay only needs to be larger .than
dmax . This follows because the CSI takes away the dependence of the channel state
on the delayed QLI.
5.3
Simulation Results
To begin, we simulate a 6-Queue system with Bernoulli arrival processes of different
rates. Assume the controller has instantaneous CSI for its channel, and homogeneously delayed (2 slots) CSI of each other channel. For each symmetric arrival rate
vector λ, we simulate the evolution of the system over 100, 000 time-slots, and compute the average system backlog over this time. The results are plotted in Figure
5-12. Clearly, for small arrival rates, the average queue length remains very small.
As the arrival rates increase towards the boundary of the stability region, the average
system backlog starts to increase. When the arrival rate grows beyond the stability
region, the average queue length increases greatly, since packets arrive faster than
187
they can be served in the system, implying that the system is unstable in this region.
Figure 5-12 compares several controller placement policies. First, we plot the
results of a fixed controller policy, as in Section 5.1. This is compared with a policy
that chooses a controller at each time uniformly at random. Note that this random
policy is optimal when the arrival rate is the same to each node, as it represents the
correct stationary policy to stabilize the system. The red curve corresponds to the
DCPS policy using QLI for controller placement, and the green curve corresponds to
the DCPS policy using both QLI and delayed CSI to place the controller.
First of all, dynamically changing the controller location provides a 7% increase
in capacity region over the static controller placement. Additionally, observe that in
this example, choosing a controller based on homogeneously delayed CSI as well as
QLI offers a 6% to the capacity region over the region for policies restricted to using
only QLI.
In Figure 5-12a, the DCPS policy uses 2-step delayed QLI to place the controller.
In this case, the DCPS policy fails to stabilize the system for the same set of arrival
rates as the time-sharing policy, implying that the DCPS policy is not throughput
optimal. However, in Figure 5-12b, the delay on the QLI is increased to 100 time-slots.
In this scenario, the DCPS policy does stabilize the system for all symmetric arrival
rates in the stability region. Thus, using further delayed information is required for
throughput optimality. Note that using further delayed QLI in the DCPS policy
where CSI is also used does not affect the stability of the system.
The results in Figure 5-13 illustrate the effect of the delay in QLI on the stability
of the system. This figure presents four different values for τQ , the delay to QLI used
by the controller placement policy. The black dashed line corresponds to τQ = 0, the
blue dash-dot line corresponds to τQ = 4, the red dotted line corresponds to τQ = 8,
and the green solid line corresponds to τQ = 50. As τQ increases, the system remains
stable for more arrival rates. In this example, using sufficiently delayed QLI yields a
16% increase in the stability region of the system.
Additionally, we simulate the controller placement problem over a network, to
compare the dynamic controller placement with the static controller placement in
188
(a) 2-Step Delayed QLI
(b) 100-Step Delayed QLI
Figure 5-12: Simulation results for different controller placement policies, with channel
model parameters p = 0.1, q = 0.1.
189
Figure 5-13: Effect of QLI-delay on system stability, for p = q = 0.1. Each curve corresponds
to a different value of τQ .
3
2
0
4
1
5
6
Figure 5-14: Two-level binary tree topology.
Section 5.1. Consider the simple network in Figure 5-14. Figure 5-15 analyzes the
stability of the system over different controller placement policies. The black solid
line represents the DCPS policy, with QLI delay τQ = 150. This policy is compared
with the policy that randomly selects the controller and the policy that places the
controller at node 3. These results show that relocating the controller according to
the DCPS policy shows improvements over both the optimal static placement, and a
equal time-sharing between controller placements.
Figure 5-16 shows the fraction of time each node is selected as the controller under
the DCPS policy for the binary-tree topology of Figure 5-14. For small transition
190
Figure 5-15: Results for different controller placement policies on tree network in Figure
5-14: DPCS Policy with τQ = 150, equal time-sharing, and fixed controller at node 3.
Simulation ran for 40,000 time slots with p = q = 0.3.
probabilities (e.g. p = q = 0.1), the central node 3 is chosen as the controller
most frequently. When the transition probabilities increase (e.g. p = q = 0.3),
then more time is spent with nodes 2 and 4 as controllers. This corresponds with
the analysis for static controller placement shown in Section 5.1.2. Moreover, as the
arrival rate increases toward the boundary of the stability region, the results resemble
the static results even more closely. Note that the DCPS policy can be applied to
any network topology, but we only consider smaller topologies in this work due to the
computational complexity of computing the optimal controller location.
5.3.1
Infrequent Controller Relocation
Throughout this chapter, we assume that a new controller is chosen at every time
slot. This is justified by ensuring that the controller placement algorithm depends
only on information that is available to each node in the network. Thus, there is
no additional communication overhead required to compute the controller placement.
191
Empirical Controller Locations,
= 0.2
Fraction of time node is controller
0.7
p=q=0.1
p=q=0.3
0.6
0.5
0.4
0.3
0.2
0.1
0
0
1
2
3
4
Node Index
5
6
(a) Symmetric arrival rate λ = 0.2.
Empirical Controller Locations,
Fraction of time node is controller
0.7
= 0.25
p = q = 0.1
p = q = 0.3
0.6
0.5
0.4
0.3
0.2
0.1
0
0
1
2
3
Node Index
4
5
6
(b) Symmetric arrival rate λ = 0.25.
Figure 5-16: Fraction of time each node is selected as the controller under DCPS for the
topology in Figure 5-14. Blue bars correspond to system with p = q = 0.1, and red bars
correspond to system with p = q = 0.3.
192
However, there may be an additional cost associated with relocating the controller
due to the computation required. Therefore, in this section, we consider the case in
which the controller placement occurs infrequently.
Consider a modified version of the controller placement problem, in which the
controller is relocated every N time slots. As discussed in Section 5.2.2, the throughput region is not affected by infrequent controller placement. Lemma 15 shows that
any arrival rate λ ∈ Λ corresponds to a stationary policy which stabilizes the system.
The throughput region Λ is formed by a time-sharing between controller placements.
Consequently, the frequency of changing the controller placement does not affect
throughput, but rather the overall fraction of time spent in each controller state.
The DCPS policy of Section 5.2.2 extends directly to the case of infrequent controller placement as follows.
Theorem 24. Consider the dynamic controller placement and scheduling policy (DCPS),
which operates in two steps. First, at each time t = k ∗ N , choose a controller by solving the following optimization as a function of the delayed queue backlogs Qi (kN −τQ ).
∗
r = arg max
r
X
PS (s) max Qi (kN −
i
s∈S
d (i)
τQ )psir,1
(5.42)
where PS (s) is the steady state probability of the channel-state process. At the subsequent time slots t = kN +j, the controller uses its observed CSI S(kN +j−dr∗ (i)) = s,
and schedules the following queue to transmit.
d
∗ (i)
i∗ = arg max Qi (kN − τQ )psir,1
(5.43)
i
For any arrival rate λ, and > 0 satisfying λ + 1 ∈ Λ, the DCPS policy stabilizes
the system if τQ ≥ dmax + TSS () for TSS () defined in (5.35).
The DPCS policy of Theorem 24 differs from that of Theorem 21 in that controller
placement decisions are only made in time slots which are multiples of N , but the
controller placement calculation is the same as in Theorem 21. The scheduling portion
of Theorem 24 uses the delayed QLI with respect to the time at which the controller
193
was placed, rather than the current time slot. This additional delay in QLI, does
not affect the throughput optimality of the policy. The proof of Theorem 24 follows
similarly to the proof of Theorem 21, except using a T -slot drift argument at every
time slot t = kN rather than every time slot.
5.4
Conclusion
This chapter studies the impact of controller location on the performance of centralized scheduling in wireless networks. In Chapter 4, we showed that delayed CSI is
inherent in centralized schemes, and this delay is related to the topology of the network. This chapter studies the impact of controller location on the performance of
centralized scheduling in wireless networks, as the location of the controller directly
influences the delays at which the CSI is available to the controller.
First, we formulated the location of the optimal static controller placement, and
developed near-optimal, low-complexity heuristics to place controllers over large networks. We consider dynamically placing controllers, using queue length information
(QLI) to move the controller to the heavily backlogged areas of the network. We
characterize the throughput region under dynamic controller placement, and propose
a throughput optimal joint controller placement and scheduling policy. This policy
uses significantly delayed QLI to place the controllers, and the CSI available at the
controller to schedule links. We extend this policy to the case where CSI can also be
used to place the controller.
An interesting result in this section is that when the controller placement depends
only on delayed QLI, the throughput optimal policy uses a very delayed version of
the QLI, even if better QLI is available. This is due to the fact that QLI is related to
CSI, particularly if there is a high degree of memory in the system. This is explored
further in Chapter 6.
194
5.5
5.5.1
Appendix
Proof of Theorem 21
Theorem 21: Consider the dynamic controller placement and scheduling (DCPS)
policy, which operates in two steps. First, choose a controller by solving the following
optimization as a function of the delayed queue backlogs Qi (t − τQ ).
∗
r = arg max
r
X
PS (S(t − dr ) = s) max Qi (t −
i
s∈S
d (i)
τQ )psir,1
(5.44)
where PS (s) is the steady state probability of the channel-state process. Then the
controller uses its observed CSI S(t − dr∗ (i)) = s, and schedules the following queue
to transmit.
d
∗ (i)
i∗ = arg max Qi (t − τQ )psir,1
(5.45)
i
For any arrival rate λ, and > 0 satisfying λ + 1 ∈ Λ, the DCPS policy stabilizes
the system if τQ ≥ dmax + TSS () for TSS () defined in (5.35).
Proof of Theorem 21. Define the following quadratic Lyapunov function:
M
L(Q(t)) =
1X 2
Q (t).
2 i=1 i
(5.46)
The T -step Lyapunov drift is computed as
∆T (Y(t)) = E L(Q(t + T )) − L(Q(t))Y(t)
(5.47)
We show that under the throughput optimal policy, the T -step Lyapunov drift is
negative for large backlogs, implying the stability of the system under the throughput
optimal max-weight policy for all arrival rates in the interior of Λ, by the FosterLyapunov Criteria [49]. To prove throughput optimality of the DCPS policy, we
bound the Lyapunov drift under DCPS by combining (5.32), (5.46) and (5.47), and
show for large queue lengths, the Lyapunov drift is negative. Let Di (t) = DiDCPS (t)
195
refer to the departure process of policy DCPS. Consider the T -step Lyapunov drift
for T > τQ .
X
T −1
T −1
2
2
M
1 X
1 X
∆T (Y(t)) ≤ E
Ai (t + k) +
Di (t + k)
2
2
i=1
k=0
k=0
TX
−1
T
−1
X
+ Qi (t)
Ai (t + k) −
Di (t + k) Y(t)
k=0
k=0
X
TX
M
−1
T
−1
X
≤B+E
Qi (t)
Ai (t + k) −
Di (t + k) Y(t)
i=1
(5.48)
k=0
(5.49)
k=0
(5.50)
where B is a constant, which is finite due to the boundedness of the second moment
of the arrival and departure process.
The difference between queue lengths at any two times t and s is bounded using
the following inequality:
Qi (t) − Qi (s) ≤ |t − s|,
(5.51)
which holds by assuming that an arrival occurs in each slot, and no departures occur,
or vice versa. Using this inequality, a relationship is established between current
queue lengths and delayed queue lengths.
Qi (t) ≤ Qi (t + k − τQ ) + |k − τQ |
(5.52)
Qi (t) ≥ Qi (t + k − τQ ) − |k − τQ |
(5.53)
The inequalities in (5.52) and (5.53) are used in (5.49) to upper bound the Lyapunov
drift in terms of the delayed QLI for each slot, Qi (t + k − τQ ).
X
T −1 X
M
T −1 X
M
X
Qi (t)Ai (t + k) −
Qi (t)Di (t + k)Y(t)
∆T (Y(t)) ≤ B + E
k=0 i=1
k=0 i=1
X
T −1 X
M
≤B+E
(Qi (t + k − τQ ) + |k − τQ |)λi
k=0 i=1
196
(5.54)
T −1 X
M
X
−
(Qi (t + k − τQ ) − |k − τQ |)Di (t + k)Y(t)
(5.55)
k=0 i=1
=B+
T −1
X
k=0
−
≤ B + 2M
X
X
M
T −1 X
M
|k − τQ |E
(λi + Di (t + k))Y(t) + E
Qi (t + k − τQ )λi
i=1
T −1 X
M
X
k=0 i=1
T
−1
X
k=0
k=0 i=1
Qi (t + k − τQ )Di (t + k)Y(t)
(5.56)
X
T −1 X
M
|k − τQ | + E
Qi (t + k − τQ ) λi − Di (t + k) Y(t)
k=0 i=1
(5.57)
X
T −1 X
M
≤ B + 2M T + E
Qi (t + k − τQ ) λi − Di (t + k) Y(t)
(5.58)
X
M
T −1 X
0
Qi (t + k − τQ ) λi − Di (t + k) Y(t)
≤B +E
(5.59)
2
k=0 i=1
k=0 i=1
Equation (5.55) follows from replacing expected value of the arrival process with
the arrival rate λi . Equation (5.57) follows from upper bounding the per slot arrival
and departure rate each by 1. Equation (5.58) follows from the fact that T ≥ τQ .
Equation (5.59) follows by defining B 0 = B + 2M T 2 .
Throughout the proof, we refer to the law of iterated expectations [8], which states
that for random variables X, Y , and Z, the conditional expectation of X given Z is
expanded as
EX [X|Y ] = EZ EX [X|Y, Z]Y
(5.60)
where the subscript on the expectation references the random variable for which the
expectation is taken over.
The remainder of the proof follows by showing that as queue lengths get large, the
Lyapunov drift is upper bounded by a negative quantity. Consider the second term
on the right hand side of (5.59). This expectation is rewritten by conditioning on the
delayed QLI at the current slot t + k and using the law of iterated expectations.
197
E
X
T −1 X
M
Qi (t + k − τQ ) λi − Di (t + k) Y(t)
k=0 i=1
X
T −1
=E
k=0
X
M
E
Qi (t + k − τQ ) λi − Di (t + k) Q(t + k − τQ ) Y(t) (5.61)
i=1
To bound (5.61), we require the channel state at slot t + k to be independent from
Y(t), which only holds if k is sufficiently large. Thus, we break the summation in
(5.59) over the T slots into two parts: A smaller number of slots for which the value
of k is small, and a larger number of slots where the value of k is large. An overly
conservative bound is used for k < TSS + dmax , but the frame size T is chosen to
ensure that the first TSS + dmax slots is a small fraction of the overall T slots. We
drop the argument to the function TSS (), but the dependence on is clear.
T −1 X
M
X
∆T (Y(t)) ≤ B +
E
Qi (t + k − τQ ) λi − Di (t + k) Y(t)
0
0
≤B +
i=1
k=0
TSS +d
max −1
X
i=1
k=0
T
X
+
X
M
E
Qi (t + k − τQ ) λi − Di (t + k) Y(t)
k=TSS +dmax
X
M
E
Qi (t + k − τQ ) λi − Di (t + k) Y(t)
(5.62)
i=1
For values of k < TSS + dmax , the upper bound follows by trivially upper bounding
the arrivals by 1 and lower bounding the departures by 0 in each slot.
TSS +d
max −1
X
k=0
X
M
E
Qi (t + k − τQ ) λi − Di (t + k) Y(t)
i=1
≤
≤
TSS +d
M
max −1 X
X
i=1
k=0
TSS +d
−1
M
max
X
X
k=0
Qi (t + k − τQ )
Qi (t − τQ ) +
i=1
(5.63)
TSS +d
M
max −1 X
X
k=0
198
i=1
k
(5.64)
≤ (TSS + dmax )
M
X
i=1
1
Qi (t − τQ ) + (TSS + dmax )2 M
2
(5.65)
where (5.64) follows from (5.51).
Now consider the time slots for which k ≥ TSS + dmax . For these slots, we have
k ≥ τQ ≥ dmax ≥ dr (i). For these time-slots, we bound the Lyapunov drift by
computing the conditional departure rate under the DCPS policy, and showing that
this policy must have a higher departure rate than all stationary policies. However,
from Lemma 15, we know that a stationary policy exists which stabilizes the system,
proving that the DCPS policy also stabilizes the system. To begin, the interior
expectation in (5.61) is expanded as
X
M
E
Qi (t + k − τQ ) λi − Di (t + k) Q(t + k − τQ ), Y(t)
=
i=1
M
X
Qi (t + k − τQ ) λi − E Di (t + k) Q(t + k − τQ ), Y(t)
(5.66)
(5.67)
i=1
=
M
X
Qi (t + k − τQ )λi −
i=1
M
X
Qi (t + k − τQ )E Di (t + k)Q(t + k − τQ ), Y(t) .
i=1
(5.68)
Consider the right-most expression in equation (5.68). Let φri be a binary indicator
variable denoting whether queue i is scheduled under policy DCPS as a function of
the delayed QLI and delayed CSI from controller r, and ψr be an indicator variable
denoting whether node r is selected as the controller, as a function of delayed QLI
only.
M
X
Qi (t + k − τQ )E Di (t + k)Q(t + k − τQ ), Y(t)
i=1
=
M
X
Qi (t + k − τQ )
i=1
X
M
·E
ψr (Q(t + k −
τQ ))φri
S(t + k − dr ), Q(t + k − τQ ) Si (t + k)|Q(t + k − τQ ), Y(t)
r=1
(5.69)
199
=
M X
M
X
ψr (Q(t + k − τQ ))Qi (t + k − τQ )
i=1 r=1
r
· E E φi S(t + k − dr ), Q(t + k − τQ ) Si (t + k)Si (t + k − dr (i)) Q(t + k − τQ ), Y(t)
(5.70)
=
M X
M
X
ψr (Q(t + k − τQ ))Qi (t + k − τQ )
i=1 r=1
· E φri S(t + k − dr ), Q(t + k − τQ ) E Si (t + k)Si (t + k − dr (i)) Q(t + k − τQ ), Y(t)
(5.71)
=
M X
M
X
ψr (Q(t + k − τQ ))Qi (t + k − τQ )
i=1 r=1
dr (i)
r
· E φi S(t + k − dr ), Q(t + k − τQ ) pSi (t+k−dr ),1 Q(t + k − τQ ), Y(t)
(5.72)
Equation (5.70) follows since the controller placement under DCPS is completely determined given delayed QLI, and by applying the law of iterated expectation. Equation (5.71) follows since the link schedule under controller r for policy DCPS is completely determined given delayed QLI (Q(t + k − τQ )) and delayed CSI (S(t + k − dr )).
Lastly, equation (5.72) follows using the k-step transition probability of the Markov
chain.
Note that the throughput optimal policy is that which maximizes the expression in
(5.72); however, the expectation cannot be computed because it requires knowledge of
the conditional distribution of the channel state sequence given QLI, which depends
on the arrival rate. However, when QLI is sufficiently delayed, then the conditioning
on QLI is removed as follows.
P S(t + k − dr ) = sQ(t + k − τQ ), Y(t)
X
=
P S(t + k − τQ ) = s0 Q(t + k − τQ ), Y(t)
s0 ∈S
· P S(t + k − dr ) = sS(t + k − τQ ) = s0 , Q(t + k − τQ ), Y(t)
200
(5.73)
X
=
P S(t + k − τQ ) = s0 Q(t + k − τQ ), Y(t)
s0 ∈S
· P S(t + k − dr ) = sS(t + k − τQ ) = s0
≥
X
0
P S(t + k − τQ ) = s Q(t + k − τQ ), Y(t)
s0 ∈S
(5.74)
P S(t + k − dr ) = s −
2|S|
(5.75)
= P S(t + k − dr ) = s −
2|S|
(5.76)
Equation (5.73) follows from the law of total probability. Equation (5.74) holds using
the fact that k ≥ τQ and the fact that due to the Markov property of the channel
state, the state at time t+k−dr is conditionally independent of Y(t) and Q(t+k−τQ )
given S(t + k − τQ ). Equation (5.75) holds using the fact that τQ ≥ TSS + dmax , and
thus, by the definition of TSS in (5.35), the conditional state distribution is within
2|S|
of the stationary distribution.
Consequently, the expression in (5.72) can be bounded in terms of an unconditional
expectation.
dr (i)
r
E φi S(t + k − dr ), Q(t + k − τQ ) pSi (t+k−dr (i)),1 Q(t + k − τQ ), Y(t)
=
X
d (i)
P S(t + k − dr ) = sQ(t + k − τQ ), Y(t) φri s, Q(t + k − τQ ) psir,1
(5.77)
s∈S
≥
X
d (i)
P S(t + k − dr ) = s φri s, Q(t + k − τQ ) psir,1
s∈S
d (i)
X r
φi S(t + k − dr ) = s, Q(t + k − τQ ) psir,1
2|S| s∈S
dr (i)
r
≥ ES(t+k−dr ) φi S(t + k − dr ), Q(t + k − τQ ) pSi (t+k−dr (i)),1 −
2
−
(5.78)
(5.79)
d (i)
The inequality in (5.79) follows by upper bounding φri (s, Q) ≤ 1 and pSri ,1 ≤ 1.
201
Plugging (5.79) into (5.72) yields
M X
M
X
ψr (Q(t + k − τQ ))Qi (t + k − τQ )
i=1 r=1
dr (i)
r
· E φi S(t + k − dr ), Q(t + k − τQ ) pSi (t+k−dr (i)),1 Q(t + k − τQ ), Y(t)
≥
M X
M
X
ψr (Q(t + k − τQ ))Qi (t + k − τQ )
i=1 r=1
dr (i)
r
· E φi S(t + k − dr ), Q(t + k − τQ ) pSi (t+k−dr (i)),1 −
2
M
M
M
XX
X
≥−
Qi (t + k − τQ ) +
ψr (Q(t + k − τQ ))Qi (t + k − τQ )
2 i=1
i=1 r=1
dr (i)
r
· E φi S(t + k − dr ), Q(t + k − τQ ) pSi (t+k−dr (i)),1
where the inequality in (5.81) follows from upper bounding
PM
r=1
(5.80)
(5.81)
φr (Q(t + k − τQ ))
by 1.
Under the DCPS policy, the service rate as a function of the controller and delayed
CSI observation is given by:
M
X
d (i)
d (i)
φri s, Q(t + k − τQ ) Qi (t + k − τQ )psir,1 = max Qi (t + k − τQ )psir,1 .
i
i=1
(5.82)
Similarly, the expression for the expected value of the departure process can be rewritten using (5.82) and the structure of the controller placement policy of DCPS.
M
X
ψr (Q(t+k −τQ ))ES
r=1
d (i)
max Qi (t+k −τQ )psir,1
i
= max ES
r
d (i)
max Qi (t+k −τQ )psir,1
i
(5.83)
Combining equations (5.83) and (5.81), and plugging this into equation (5.68)
yields
M
X
i=1
Qi (t + k − τQ )λi −
M
X
Qi (t + k − τQ )E Di (t + k)Q(t + k − τQ ), Y(t) (5.84)
i=1
202
≤
M
X
Qi (t + k − τQ )λi − max ES max Qi (t + k −
r
i=1
i
d (i)
τQ )psir,1
M
X
Qi (t + k − τQ )
+
2 i=1
(5.85)
Now, we reintroduce the stationary policy in Lemma 15 to complete the bound.
Recall, for any λ ∈ Λ, there exists a stationary policy which assigns controller r
with probability βr , and schedules node i for transmission with probability αir (s) for
delayed CSI s ∈ S, such that
λi + ≤
X
PS (s)
M
X
d (i)
βr αir (s)psir,1
∀i ∈ {1, . . . , M }
(5.86)
r=1
s∈S
to be
Note that the in (5.35) and in (5.86) are designed to be equal. Define µSTAT
i
the average departure rate of queue i under this stationary policy. In other words,
µSTAT
,
i
X
PS (s)
M
X
d (i)
βr αir (s)psir,1
(5.87)
r=1
s∈S
The expression in (5.85) is rewritten by adding and subtracting identical terms corresponding to the stationary policy µSTAT .
M
X
Qi (t + k − τQ )λi − max ES max Qi (t + k −
r
i=1
+
M
X
i
Qi (t + k − τQ )µSTAT
−
i
i=1
=
M
X
M
X
d (i)
τQ )psir,1
M
+
Qi (t + k − τQ )µSTAT
i
X
Qi (t + k − τQ )
2 i=1
(5.88)
i=1
Qi (t + k − τQ )(λi −
µSTAT
)
i
+
i=1
M
X
Qi (t + k − τQ )µSTAT
i
i=1
− max ES max Qi (t −
r
i
d (i)
τQ )psir,1
M
X
+
Qi (t + k − τQ )
2 i=1
(5.89)
The first term in (5.89) is bounded using (5.86), which follows because the stationary
203
policy stabilizes the system.
M
X
Qi (t + k − τQ )(λi −
µSTAT
)
i
≤ −
M
X
i=1
Qi (t + k − τQ )
(5.90)
i=1
The second term in (5.89) is bounded by relating the stationary policy to the DCPS
policy:
M
X
Qi (t + k − τQ )µSTAT
i
i=1
=
M
X
Qi (t + k − τQ )
M
XX
d (i)
PS (S(t + k − dr ) = s)βr αir (s)psir,1
s∈S r=1
i=1
(5.91)
=
M
X
βr
r=1
X
PS (S(t + k − dr ) = s)
M
X
d (i)
Qi (t + k − τQ )αir (s)psir,1
i=1
s∈S
(5.92)
≤
M
X
βr
r=1
X
d (i)
PS (S(t + k − dr ) = s) max Qi (t + k − τQ )psir,1
i
s∈S
(5.93)
d (i)
≤ max ES max Qi (t + k − τQ )psir,1
r
i
(5.94)
Returning to (5.89) and applying the inequalities in (5.94) and (5.90):
X
M
π
E
Qi (t + k − τQ ) λi − Di (t + k) Q(t + k − τQ ), Y(t)
i=1
≤
M
X
Qi (t + k − τQ )(λi − µSTAT
)+
i
i=1
M
X
Qi (t + k − τQ )µSTAT
i
i=1
d (i)
− max ES max Qi (t − τQ )psir,1
r
i
M
+
X
Qi (t + k − τQ )
2 i=1
M
X
Qi (t + k − τQ )
≤−
2 i=1
(5.95)
To conclude the proof, the bound in (5.51) is used to revert the QLI at time
204
t + k − τQ to a queue length at time t − τQ , which is known through knowledge of
Y(t).
M
−
M
X
X
Qi (t + k − τQ ) ≤ −
Qi (t − τQ ) + M k
2 i=1
2 i=1
2
(5.96)
Now, we have an upper bound for the slots k ≥ TSS + dmax to combine with the
bound for k ≤ TSS + dmax . Plugging these bounds into the drift bound of (5.62),
yields
0
1
Qi (t − τQ ) + (TSS + dmax )2 M
2
i=1
M
X
E −
Qi (t − τQ ) + M k Y(t)
2 i=1
2
∆T (Y(t)) ≤ B + (TSS + dmax )
T
X
+
k=TSS +dmax
M
X
≤ B 0 + (TSS + dmax )
M
X
i=1
1
Qi (t − τQ ) + (TSS + dmax )2 M
2
T −1
M
− (T − TSS − dmax )
X
X
Qi (t − τQ ) +
2 i=1
k=T +d
SS
0
≤ B + (TSS + dmax )
M
X
i=1
(5.97)
Mk
2
(5.98)
max
Qi (t − τQ ) + M (T 2 − (TSS − dmax )2 )
4
M
1
X
+ (TSS + dmax )2 M − (T − TSS − dmax )
Qi (t − τQ )
2
2 i=1
1
≤ B 0 + (TSS + dmax )2 M (1 − ) + M T 2
2
2
4
M
M
X
X
+ (TSS + dmax )
Qi (t − τQ ) − (T − TSS − dmax )
Qi (t − τQ )
2
i=1
i=1
(5.99)
(5.100)
Thus, for any ξ > 0, T satisfying
T ≥
2(1 + 2 )(TSS + dmax ) + 2ξ
205
(5.101)
and positive constant K satisfying
1
K = B 0 + (TSS + dmax )2 M (1 − ) + M T 2 ,
2
2
4
it follows that
∆T (Y(t)) ≤ K − ξ
M
X
Qi (t − τQ )
(5.102)
(5.103)
i=1
Thus, for large enough queue backlogs, the T -slot Lyapunov drift is negative, and
from [48] it follows that the overall system is stable under the DCPS policy.
5.5.2
Proof of Corollary 5
Corollary 5: Consider a system of M nodes, where only one can transmit at each
time. Assume the controller has full knowledge of its own channel state and d-slot
delayed CSI for each other channel, as in Figure 5-11. At time t, the DCPS policy
places the controller at the node with the largest backlog at time t − τQ .
r∗ = arg max Qr (t − τQ )
(5.104)
r
Proof. Recall the optimal policy at each time is the DCPS policy in Theorem 21,
where the controller is chosen to maximize the expected maximum weight schedule.
Let Q(1) , . . . , Q(M ) be the ordering of delayed queue lengths Q(t − τQ ), such that
Q(1) ≥ Q(2) . . . ≥ Q(M ) . Consider placing the controller at the node corresponding to
Q(1) . Let k2 be the largest index i such that Q(i) pd11 ≥ Q(2) pd01 . The expected maxweight is a random variable, which takes values determined by the CSI. Let M Wi
be the weight of the schedule activated by a controller at the ith largest queue. The
expected max weight of a controller at Q(1) is given by
(1)
E[M W1 ] = Q π +
k2
X
Q(i) π(1 − π)i−1 pd11 + Q(2) pd01 (1 − π)k2
(5.105)
i=2
Equation (5.105) is derived as follows. Since Q(1) is the largest queue, if that channel
is ON the max-weight policy transmits from Q(1) . If that channel is OFF, then the
206
belief of that channel is zero, and it will not be used. Transmitting from Q(j) is
optimal only if Q(i) is OFF for all j > i, since Q(i) are sorted in decreasing order. By
the definition of k2 , for j > k2 , Q(j) pd11 < Q(2) pd01 , so it is optimal to schedule Q(2)
when Q(i) is OFF for all i ≤ k2 .
Now consider placing the controller at the node corresponding to queue Q(j) , for
j ≥ 2. Let k1 be the largest index such that Q(k1 ) pd11 ≥ Q(1) pd01 . Similarly, define kj0 to
0
be the largest index such that Q(kj ) pd11 ≥ Q(j) . The expected max weight is computed
for two cases, depending on the relationship between k1 and kj0 .
First, consider the case where Q(j) ≤ Q(1) pd01 , i.e. k1 ≤ kj0 . In this case, it is never
optimal to transmit over the channel corresponding to Q(j) , regardless of its delayed
CSI. The expected max-weight is given by
E[M Wj ] =
πpd11
k1
X
Q(i) (1 − π)i−1 + pd01 Q(1) (1 − π)k1
(5.106)
i=1
Compare the expected max weight between the controller at Q(1) and Q(j) .
(1)
E[M W1 − M Wj ] = Q π +
k2
X
Q(i) π(1 − π)i−1 pd11 + Q(2) pd01 (1 − π)k2
i=2
(1)
−Q
πpd11
−
k1
X
Q(i) (1 − π)i−1 πpd11 − Q(k1 ) (1 − π)k1 pd01 (5.107)
i=2
(1)
= Q π(1 −
pd11 )
+
pd11
k2
X
Q(i) π(1 − π)i−1
i=k1 +1
+ Q(2) pd01 (1 − π)k2 − Q(1) (1 − π)k1 pd01
(5.108)
≥ Q(1) πpd10 − Q(1) (1 − π)k1 pd01 + Q(2) pd01 (1 − π)k2
(5.109)
= Q(1) πpd10 − Q(1) π(1 − π)k1 −1 pd10 + Q(2) pd01 (1 − π)k2
≥ Q(1) πpd10 1 − (1 − π)k1 −1 + Q(2) pd01 (1 − π)k2 ≥ 0
(5.110)
(5.111)
where (5.109) follows from Q(i) ≥ 0, and (5.110) follows from the identity πpd10 =
(1 − π)pd01 .
Now consider the case where Q(j) ≥ Q(1) pd01 . In this case, there exists a state such
207
that it is optimal to transmit over Q(j) . The max-weight expression is given by
k0
E[M Wj ] = πpd11
j
X
0
Q(i) (1 − π)i−1 + Q(j) π(1 − π)kj
i=1
k1
X
πpd11
+
Q(i) (1 − π)i + pd01 Q(1) (1 − π)k1 +1
(5.112)
i=kj +1
Comparing the expected max weight between the controller at Q(1) and Q(j) .
(1)
E[M W1 − M Wj ] = Q π +
k2
X
Q(i) π(1 − π)i−1 pd11 + Q(2) pd01 (1 − π)k2 − Q(1) πpd11
i=2
kj0
− πpd11
X
Q(i) (1 − π)i−1
i=2
kj0
(j)
− Q π(1 − π) −
k1
X
πpd11
Q(i) (1 − π)i − pd01 Q(1) (1 − π)k1 +1
i=kj0 +1
(5.113)
=Q
(1)
π(pd10 )
(j)
− Q π(1 −
0
π)kj pd10
+
πpd11
k1
X
Q(i) π(1 − π)i
i=kj0 +1
+
pd11
k2
X
Q(i) π(1 − π)i−1 + Q(2) pd01 (1 − π)k2 − Q(1) (1 − π)k1 +1 pd01
i=k1 +1
(5.114)
Equation (5.114) follows from combining like terms, and breaking up the summation
over the interval i = [2, k2 ] into three intervals: [2, kj0 ], [kj0 + 1, k1 ], and [k1 + 1, k2 ], as
well as an additional term for Q(j) . The summations are bounded as follows
πpd11
k1
X
(i)
i−1
Q π(1 − π)
+
pd11
i=kj0 +1
k2
X
Q(i) π(1 − π)i−1
i=k1 +1
≥
πQ(1) pd01
k1
X
π(1 − π)
i−1
i=kj0 +1
+
Q(2) pd01
k2
X
π(1 − π)i−1
(5.115)
i=k1 +1
0
= πQ(1) pd01 (1 − π)kj − (1 − π)k1 + Q(2) pd01 (1 − π)k1 − (1 − π)k2
(5.116)
208
The inequality in (5.116) follows from the fact that Q(1) pd01 ≤ Q(i) pd11 for i ≤ k1 , and
Q(2) pd01 ≤ Q(i) pd11 for i ≤ k2 . Plugging this into equation (5.114)
0
E[M W1 − M Wj ] ≥ Q(1) πpd10 − Q(j) π(1 − π)kj pd10 + Q(2) pd01 (1 − π)k2 − Q(1) (1 − π)k1 +1 pd01
0
+ πQ(1) pd01 (1 − π)kj − (1 − π)k1 + Q(2) pd01 (1 − π)k1 − (1 − π)k2
0
≥ Q(1) πpd10 − Q(j) π(1 − π)kj pd10 − Q(1) (1 − π)k1 +1 pd01 + Q(2) pd01 (1 − π)k1
(5.117)
0
≥ Q(1) πpd10 (1 − (1 − π)k1 ) − Q(j) π(1 − π)kj pd10 + Q(2) πpd10 (1 − π)k1 −1
(5.118)
0
≥ Q(2) πpd10 (1 − (1 − π)k1 ) − Q(2) π(1 − π)kj pd10 + Q(2) πpd10 (1 − π)k1 −1
(5.119)
0
= Q(2) πpd10 1 − (1 − π)k1 − (1 − π)kj + (1 − π)k1 −1 ≥ 0
(5.120)
The inequality in (5.117) follows from kj0 ≥ k1 , and canceling out Q(2) terms, (5.118)
follows from the identity πpd10 = (1−π)pd01 , and (5.119) holds since Q(1) ≥ Q(2) ≥ Q(j) .
Therefore, for all j ≥ 2, placing the controller at the node corresponding to Q(j)
results in a lower expected max weight than placing at the node corresponding to
the longest queue. Thus, placing the controller at the longest queue is the optimal
controller placement policy.
5.5.3
Proof of Lemma 16
Lemma 16 Suppose there exists a policy P ∈ Π that stabilizes the system. Then,
there exists variables βr (s) and αir (s0 ) such that (5.14) has a solution with ∗ ≥ 0.
Proof. Suppose the system is stabilized with some control policy functions β(t), which
chooses a controller depending on the QLI and CSI only through S(t − dmax ), and
αr (t) which chooses a link activation based on delayed CSI and QLI, with delays
relative to the controller. Without loss of generality, let βr (t) be an indicator function
signaling whether node r is chosen to be the controller at time t, and let αir (t) be an
indicator signaling whether link i is scheduled for transmission at time t. Under any
209
such scheme, the following relationship holds between arrivals, departures, and queue
backlogs:
t
X
Ai (τ ) ≤ Qi (t) +
τ =1
t X
M
X
µri (βr (τ ), αir (τ )),
(5.121)
τ =1 r=1
where µi is the service rate of the ith queue as a function of the control decisions.
Writing out the expression for µi yields
t
X
Ai (τ ) ≤ Qi (t) +
τ =1
t X
M
X
βr (τ )αir (τ )E[Si (τ )|Si (τ − dr (i))].
(5.122)
τ =1 r=1
Let TS be the subintervals of [0, t] such that S(t − dmax ) = S. Further, let TSr be
r
the subintervals of TS such that r is the controller. Lastly, define TS,S
0 to be the
subintervals of TSr such that the controller observes delayed CSI of S(t − dmax ) = S
r
and S(t − dr (i)) = S0 . Let |Tr |, |TSr |, and |TS,S
0 | be the aggregate lengths of those
intervals. Because arrival and the channel state processes are ergodic, and the number
of channel states and queues are finite, then there exists a time t1 such that for all
t ≥ t1 , the empirical average arrival rates and state occupancy fractions are within of their expectations.
t
1X
Ai (τ ) ≥ λi − t τ =1
(5.123)
1 r
|T 0 | ≤ P(S(t − dmax ) = S, S(t − dr ) = S0 ) + t S,S
(5.124)
The above equations hold with probability one from the strong law of large numbers.
Furthermore, since the system is stable under the considered policy, [48] shows that
there exists a V such that for an arbitrarily large t,
P
X
M
Qi (t) ≤ V
i=1
1
≥ .
2
Thus, let t be large enough such that t ≥ t1 and
V
t
(5.125)
≤ . The inequality (5.122)
P
is evaluated at this time t, and by dividing by t and assuming M
i=1 Qi (t) ≤ V , it
210
follows that
t
t
M
1
1 XX
1X
Ai (τ ) ≤ V +
βr (τ )αi (τ )E[Si (t)|Si (t − dr (i))].
t τ =1
t
t τ =1 r=1
t
M
(5.126)
t
X1X
1X
Ai (τ ) ≤ +
βr (τ )αi (τ )E[Si (t)|Si (t − dr (i))].
λi − ≤
t τ =1
t τ =1
r=1
The above inequality follows from (5.123) and holds with probability
1
2
(5.127)
due to (5.125).
Break up the above summation based on the globally delayed CSI S(t − dmax ), and
then further based on the selected controllers, as determined by βr (τ ).
λi ≤ 2 +
M
X
1XX
r=1
t
βr (τ )αi (τ )E[Si (t)|Si (t − dr (i))]
(5.128)
S∈S τ ∈TS
M
X |TS | 1 X X
βr (τ )αi (τ )E[Si (t)|Si (t − dr (i))]
= 2 +
t |TS | τ ∈T r=1
S∈S
(5.129)
M
X |TS | X
|TSr | 1 X
= 2 +
αi (τ )E[Si (t)|Si (t − dr (i))]
r
t
|T
|
|T
|
S
r
S
r=1
S∈S
τ ∈T
(5.130)
S
S
= 2 +
X |TS |
S∈S
t
M
X
β̂r (S)
r=1
1 X
αi (τ )E[Si (t)|Si (t − dr (i))]
|TSr | τ ∈T r
(5.131)
S
The last equation follows from defining
β̄r (S) ,
|TSr |
,
|TS |
(5.132)
the fraction of time that r is chosen as the controller when the delayed state satisfies
S(t − dmax ) = S.
Now, break the summation over TSr into separate summations over the sub-intervals
r
0
TS,S
0 for each observed S(t − dr (i)) = S .
λi ≤ 2 +
M
X |TS | X
S∈S
= 2 +
t
X |TS |
S∈S
t
β̂r (S)
r=1
M
X
r=1
X 1 X
d (i)
αi (τ )pSr0 ,1
r
i
|TS | τ ∈T r
S 0 ∈S
(5.133)
r
X
X |TS,S
0|
1
d (i)
αi (τ )pSr0 ,1
r
r
i
|TS | |TS,S 0 | τ ∈T r
S 0 ∈S
(5.134)
S,S 0
β̂r (S)
S,S 0
211
= 2 +
M
X |TS | X
S∈S
= 2 +
≤
t
r
X |TS,S
0|
d (i)
β̂r (S)
ᾱir (S, S 0 )pSr0 ,1
r
i
|TS |
r=1
S 0 ∈S
(5.135)
M
r
X X |TS,S
0| X
d (i)
β̂r (S)ᾱir (S, S 0 )pSr0 ,1
i
t
r=1
S∈S S 0 ∈S
XX
(5.136)
P Si (t − dmax ) = S, Si (t − dr (i)) = S
0
M
X
S∈S S 0 ∈S
d (i)
β̂r (S)ᾱir (S, S 0 )pSr0 ,1
i
r=1
+ (2 + |S|2 )
(5.137)
where (5.135) follows by defining
ᾱir (S, S 0 ) ,
1
X
r
|TS,S
0|
τ ∈T r
αi (τ )
(5.138)
S,S 0
and (5.137) follows from (5.124).
Because the original control functions satisfy
P
r
βr (t) ≤ 1 and
P
i
αi (t) ≤ 1, it
follows that β¯r and α¯ir that satisfy these same constraints. Furthermore, the fraction
of time node r is the controller, β̄r , depends on channel state information through only
S(t − dmax ). The link schedule variable ᾱir is a stationary probability as a function of
both S(t − dmax ) and S(t − dr (i)); however, due to the Markov property of the system,
the optimal policy does not depend on the older CSI.
Inequality (5.137) holds with probability greater than 21 , implying that there exists
a set of stationary control decisions βr (S) and αri (S, S 0 ) satisfying the constraints in
(5.137) for all i. If there was no such stationary policy, than this inequality would
hold with probability 0. Therefore, λ is arbitrarily close to a point in the region Λ,
implying the constraints imposed by Λ are necessary for system stability.
5.5.4
Proof of Theorem 23
Theorem 23: Consider the modified DCPS policy, which operates in two steps. First,
choose a controller by solving the following optimization as a function of the delayed
212
queue backlogs Q(t − τQ ) and delayed CSI S(t − dmax ).
∗
r = arg max
X
r
d (i)
P S(t − dr (i)) = sS(t − dmax ) max Qi (t − τQ )psir,1
i
s∈S
(5.139)
The controller observes CSI S(t − dr∗ (i)) = s, and schedules the following queue to
transmit.
d
∗ (i)
i∗ = arg max Qi (t − τQ )psir,1
(5.140)
i
The DCPS policy in (5.40) and (5.41) is throughput optimal if τQ > dmax .
Proof of Theorem 23. The proof of this Theorem follows the same structure as the
proof to Theorem 21. We use the same Lyapunov function in (5.46) and the drift
expression in (5.47). We bound the Lyapunov drift under DCPS by combining (5.32),
(5.46) and (5.47), and show that for large queue lengths, the Lyapunov drift is negative. Let Di (t) = DiDCPS (t) refer to the departure process of policy DCPS. Recall,
from (5.59), the Lyapunov drift is bounded as
X
T −1 X
M
∆T (Y(t)) ≤ B + E
Qi (t + k − τQ ) λi − Di (t + k) Y(t)
0
(5.141)
k=0 i=1
Now consider the last term on the right hand side of the above equation. This
expectation is rewritten by conditioning on the delayed QLI at the current slot t+k, as
well as the globally available delayed CSI, and using the law of iterated expectations.
TX
−1 X
M
E
Qi (t + k − τQ ) λi − Di (t + k) Y(t)
k=0 i=1
TX
−1
=E
k=0
X
M
E
Qi (t + k − τQ ) λi − Di (t + k) S(t + k − dmax ), Q(t + k − τQ ) Y(t)
i=1
(5.142)
Note (5.142) differs from (5.61) because of the extra conditioning on channel state.
Similarly to the proof of Theorem 21, we break the summation over the T slots into
two parts: A smaller number of slots for which the value of k is small, and a larger
213
number of slots where the value of k is large. An overly conservative bound is used
for k < TSS + dmax , but the frame size T is chosen to ensure that the first TSS + dmax
slots is a small fraction of the overall T slots. We drop the argument to the function
TSS (), but the dependence on is clear.
T −1 X
M
X
∆T (Y(t)) ≤ B +
E
Qi (t + k − τQ ) λi − Di (t + k) Y(t)
0
0
≤B +
+
i=1
k=0
TSS +d
max
X −1
k=0
T
X
k=TSS +dmax
X
M
E
Qi (t + k − τQ ) λi − Di (t + k) Y(t)
i=1
X
M
E
Qi (t + k − τQ ) λi − Di (t + k) Y(t) (5.143)
i=1
For values of k < TSS + dmax , the upper bound in (5.65) holds.
Consider time slots k ≥ TSS + dmax , in which the interior expectation in (5.142)
is expanded as
E
X
M
Qi (t + k − τQ ) λi − Di (t + k) S(t + k − dmax ), Q(t + k − τQ ), Y(t)
i=1
(5.144)
M
X
=
Qi (t + k − τQ ) λi − E Di (t + k) S(t + k − dmax ), Q(t + k − τQ ), Y(t)
i=1
(5.145)
=
M
X
Qi (t + k − τQ )λi
i=1
−
M
X
Qi (t + k − τQ )E Di (t + k)S(t + k − dmax ), Q(t + k − τQ ), Y(t) .
i=1
(5.146)
Consider the right-most term in equation (5.146). Let φri be a binary indicator variable
denoting whether queue i is scheduled under the DCPS policy as a function of the
delayed QLI and delayed CSI from controller r, and let ψr be an indicator variable
denoting whether node r is the controller, as a function of delayed QLI and globally
214
delayed CSI. Let Q = Q(t + k − τQ ) in the following.
M
X
Qi E Di (t + k)S(t + k − dmax ), Q, Y(t)
i=1
=
M
X
X
M
Qi E
ψr (S(t + k − dmax ), Q)
r=1
i=1
·
=
M X
M
X
φri
S(t + k − dr ), Q Si (t + k)S(t + k − dmax ), Q, Y(t)
(5.147)
ψr (S(t + k − dmax ), Q)Qi
i=1 r=1
r
· E E φi S(t + k − dr ), Q Si (t + k)Si (t + k − dr (i)) S(t + k − dmax ), Q, Y(t)
(5.148)
=
M X
M
X
ψr (S(t + k − dmax ), Q)Qi
i=1 r=1
r
· E φi S(t + k − dr ), Q E Si (t + k)Si (t + k − dr (i)) S(t + k − dmax ), Q, Y(t) (5.149)
=
M X
M
X
ψr (S(t + k − dmax ), Q)Qi
i=1 r=1
dr (i)
r
· E φi S(t + k − dr ), Q pSi (t+k−dr ),1 S(t + k − dmax ), Q, Y(t)
=
M X
M
X
(5.150)
ψr (S(t + k − dmax ), Q(t + k − τQ ))Qi (t + k − τQ )
i=1 r=1
dr (i)
r
· E φi S(t + k − dr ), Q(t + k − τQ ) pSi (t+k−dr ),1 S(t + k − dmax )
(5.151)
Equation (5.148) follows since the controller placement under the DCPS policy is
completely determined by the delayed QLI and globally delayed CSI, and then applying the law of iterated expectations. Equation (5.149) follows since the link schedule
under DCPS is completely determined given delayed QLI (Q(t + k − τQ )) and delayed
CSI (S(t + k − dr )). Equation (5.150) follows using the k-step transition probability
of the Markov chain. Lastly, equation (5.151) follows because the locally delayed CSI
S(t + k − dr ) is conditionally independent of Y(t) and Q(t + k − τQ ) given the globally
delayed CSI S(t + k − dmax ), since k ≥ dmax and τQ > dmax . Note that we do not need
a large τQ to remove the conditioning, as we did in Theorem 21.
215
Now, similarly to the proof of Theorem 21, we compare the DCPS policy to the
STAT policy in Lemma 16, which is known to stabilize the system. Under the DCPS
policy, the following simplification is made for the service rate given a controller and
delayed CSI observation.
M
X
d (i)
d (i)
φri s, Q(t + k − τQ ) Qi (t + k − τQ )psir,1 = max Qi (t + k − τQ )psir,1 .
i
i=1
(5.152)
Similarly, the expression for the expected value of the departure process is rewritten
using (5.152) and the structure of the controller placement policy of DCPS.
M
X
ψr (Q(t + k − τQ ))ES(t+k−dr ) max Qi (t + k −
i
r=1
d (i)
τQ )pSri (t+k−dr ),1 S(t
+ k − dmax )
X d (i)
= max
P S(t + k − dr ) = sS(t + k − dmax ) max Qi (t + k − τQ )psir,1
r
i
s∈S
(5.153)
Combining equations (5.153) and (5.151), and plugging this into equation (5.146)
yields
X
M
E
Qi (t + k − τQ )λi
i=1
−
M
X
i=1
Qi (t + k − τQ )E Di (t + k) S(t + k − dmax ), Q(t + k − τQ ) Y(t)
X
M
≤E
Qi (t + k − τQ )λi
i=1
X dr (i) − max
P S(t + k − dr ) = sS(t + k − dmax ) max Qi (t + k − τQ )psi ,1 Y(t)
r
i
s∈S
(5.154)
≤
M
X
Qi (t − τQ )λi + M k
i=1
X dr (i) 0
− E max
P S(t + k − dr ) = s S(t + k − dmax ) max Qi (t − τQ )ps0 ,1 Y(t) + k
r
i
s0 ∈S
i
(5.155)
216
≤
M
X
X Qi (t − τQ )λi + (M + 1)k −
P S(t + k − dmax ) = s|Y (t)
i=1
s∈S
· max
r
≤
X
s0 ∈S
d (i)
0
P S(t + k − dr ) = s S(t + k − dmax ) max Qi (t − τQ )ps0r,1
i
(5.156)
i
M
X
X
Qi (t − τQ ) −
P S(t + k − dmax ) = s
2
i=1
i=1
s∈S
X
d (i)
· max
P S(t + k − dr ) = s0 S(t + k − dmax ) max Qi (t − τQ )ps0r,1
(5.157)
M
X
Qi (t − τQ )λi + (M + 1)k +
r
i
s0 ∈S
i
Equation (5.155) follows from (5.51). Equation (5.157) follows from the definition of
TSS in (5.35), to remove the conditioning on Y(t). By Lemma 16, for any λ ∈ Λ,
there exists a stationary policy which assigns controller r with probability βr (s), and
schedules node i for transmission with probability αir (s0 ) for delayed channel state
information s, s0 ∈ S, which satisfies
λi + ≤
X
P(S(t − dmax ) = s)
M
X
βr (s)
d (i)
P(S(t − dr (i)) = s0 |S(t − dmax ) = s)αir (s0 )ps0r,1
i
s0 ∈S
r=1
s∈S
X
∀i ∈ {1, . . . , M }
(5.158)
Define µSTAT
to be the average departure rate of queue i under this stationary policy.
i
In other words,
µSTAT
i
,
X
P(S(t−dmax ) = s)
M
X
d (i)
P(S(t−dr (i)) = s0 |S(t−dmax ) = s)αir (s0 )ps0r,1
i
s0 ∈S
r=1
s∈S
X
βr (s)
(5.159)
The expression in (5.154) is rewritten by adding and subtracting identical terms
corresponding to the stationary policy µSTAT .
M
X
Qi (t − τQ )λi − ES(t−dmax )
i=1
X dr (i)
max
P S(t − dr ) = sS(t − dmax ) max Qi (t − τQ )psi ,1
r
i
s∈S
M
M
M
i=1
i=1
i=1
X
X
X
Qi (t − τQ ) +
Qi (t − τQ )µSTAT
−
Qi (t − τQ )µSTAT
+ (M + 1)k +
i
i
2
(5.160)
=
M
X
Qi (t − τQ )(λi − µSTAT
)
i
i=1
217
X
d (i)
− ES(t−dmax ) max
P S(t − dr ) = sS(t − dmax ) max Qi (t − τQ )psir,1
r
+
i
s∈S
M
M
i=1
i=1
X
X
Qi (t − τQ ) +
Qi (t − τQ )µSTAT
+ (M + 1)k
i
2
(5.161)
The first term in (5.161) is bounded using the fact that λ ∈ Λ implies (5.158).
Thus,
M
X
) ≤ −
Qi (t − τQ )(λi − µSTAT
i
i=1
M
X
Qi (t − τQ )
(5.162)
i=1
The term corresponding to the stationary policy in (5.161) is bounded as follows:
M
M
M
X
X
X
X
STAT
Qi (t − τQ )µi
=
P(S(t − dmax ) = s)
βr (s)
Qi (t − τQ )
i=1
i=1
·
X
r=1
s∈S
d (i)
P(S(t − dr (i) = s0 )|S(t − dmax ) = s)αir (s0 )ps0r,1
=
X
(5.163)
i
s0 ∈S
P(S(t − dmax ) = s)
M
X
βr (s)
r=1
s∈S
·
X
P(S(t − dr (i) = s0 )|S(t − dmax ) = s)
s0 ∈S
M
X
d (i)
Qi (t − τQ )αir (s0 )ps0r,1
i
i=1
(5.164)
≤
X
P(S(t − dmax ) = s)
M
X
βr (s)
r=1
s∈S
·
X
d (i)
P(S(t − dr (i) = s0 )|S(t − dmax ) = s) max Qi (t − τQ )ps0r,1
i
s0 ∈S
≤ ES(t−dmax ) max
r
X
0
P(S(t − dr (i) = s )|S(t − dmax )) max Qi (t −
i
s0 ∈S
(5.165)
i
d (i)
τQ )ps0r,1
i
(5.166)
Returning to (5.141) and applying the inequalities in (5.166) and (5.162):
X
M
E E
Qi (t + k − τQ ) λi − Di (t + k) Q(t + k − τQ ), S(t + k − dmax ) Y(t)
i=1
218
M
M
X
M
X
X
≤
+
Qi (t − τQ ) +
+ (M + 1)k
Qi (t − τQ )(λi −
Qi (t − τQ )µSTAT
i
2
i=1
i=1
i=1
X dr (i)
− ES max
P S(t − dr ) = sS(t − dmax ) max Qi (t − τQ )psi ,1
(5.167)
)
µSTAT
i
r
i
s∈S
M
≤−
X
Qi (t − τQ ) + (M + 1)k
2 i=1
(5.168)
This new bound applies to slots k ≥ TSS + dmax , and can be combined with the
bound in (5.65) to bound the drift term in (5.143).
+
T
X
k=TSS +dmax
M
X
1
Qi (t − τQ ) + (TSS + dmax )2 M
2
i=1
M
X
E −
Qi (t − τQ ) + (M + 1)k Y(t)
2 i=1
∆T (Y(t)) ≤ B 0 + (TSS + dmax )
0
≤ B + (TSS + dmax )
M
X
i=1
1
Qi (t − τQ ) + (TSS + dmax )2 M
2
T −1
M
− (T − TSS − dmax )
X
X
Qi (t − τQ ) +
2 i=1
k=T +d
SS
≤ B 0 + (TSS + dmax )
M
X
i=1
(5.169)
(M + 1)k
(5.170)
max
1
Qi (t − τQ ) + (TSS + dmax )2 M
2
M
− (T − TSS
X
1
− dmax )
Qi (t − τQ ) + (M + 1)(T 2 − (TSS − dmax )2 ) (5.171)
2 i=1
2
1
1
≤ B 0 + (TSS + dmax )2 M (1 − ) + (M + 1)T 2
2
2
2
M
M
X
X
Qi (t − τQ )
+ (TSS + dmax )
Qi (t − τQ ) − (T − TSS − dmax )
2
i=1
i=1
(5.172)
Thus, for any ξ > 0, T satisfying
T ≥
2(1 + 2 )(TSS + dmax ) + 2ξ
219
(5.173)
and positive constant K satisfying
1
1
K = B 0 + (TSS + dmax )2 M (1 − ) + (M + 1)T 2 ,
2
2
2
it follows that
∆T (Y(t)) ≤ K − ξ
M
X
Qi (t − τQ )
(5.174)
(5.175)
i=1
Thus, for large enough queue backlogs, the T -slot Lyapunov drift is negative, and
from [48] it follows that the overall system is stable.
220
Chapter
6
Scheduling over Time Varying Channels
with Hidden State Information
Consider the scheduling problem in a wireless downlink where channel state information (CSI) is unavailable at the base station, as in Figure 6-1. Packets arrive to
the base station and are placed in queues to await transmission to their respective
destinations. Due to wireless interference, only one transmission can be scheduled
in each time slot. Therefore, the base station must schedule transmissions such that
the queue lengths at the base station remain stable. Furthermore, the channels to
each user are independent, but evolve over time according to a Markov process. Ideally, the transmitter opportunistically schedules channels yielding a high transmission
rate; however, CSI is not available to the transmitter.
Throughput optimal scheduling was pioneered by Tassiulas and Ephremides in
[62], and has been studied in a variety of contexts. The optimal policies depend on
the channel model and the information available to the transmitter, as summarized in
Table 6.1. If the channel state process is IID, and no CSI is available, then any workconserving policy is throughput optimal; for the purpose of comparison, we define
the throughput optimal policy in this scenario to be that which schedules the longest
queue. If the transmitter has current CSI and queue length information (QLI), the
throughput optimal policy is transmit over the channel that maximizes the product
of the channel rate and the queue length at the current time [48, 62]. If the CSI and
221
QLI are delayed, Ying and Shakkottai show that the optimal policy schedules the
node with the largest product of the delayed QLI and the conditional expectation
of the channel rate at the current time, given the delayed CSI [69]. If the CSI is
not acquired until an acknowledgement is received from the transmission, then the
throughput optimal policy is to transmit over the channel that maximizes the product
of the belief of the channel and the queue backlog [32].
While throughput optimal scheduling has been studied in a variety of contexts,
to the best of our knowledge, there have been no results on throughput optimal
scheduling when the controller has QLI but not CSI, and the channel process has
memory. In fact, Tassiulas and Ephremides state that, ”An interesting variation of
the problem... is the case where the connectivity information is not available for the
decision making and the server allocation can be based on queue lengths... The study
of stability and optimal delay performance in [the] case of dependent connectivities
are open problems for further investigation.” [62]. Due to the memory in the channel
state process, the throughput-optimal policy takes a non-trivial form, and the results
from [62] and [48] cannot be directly applied.
In this chapter, we consider a scenario in which QLI is readily available to the
transmitter, but no CSI is available. In this case, the throughput optimal policy is
to schedule the node with the longest queue length, using significantly delayed QLI.
We characterize the throughput region for the case when CSI is not available at the
transmitter. Then, we propose the Delayed Longest Queue-length (DLQ) policy, and
prove it is throughput optimal over all transmission policies without access to CSI.
Lastly, we provide simulation results to support the theoretical results of delayed QLI
optimality.
6.1
System Model
Consider a system of M nodes, representing a wireless downlink, as in Figure 6-1.
Packets arrive externally at the base station, and are destined for node i according
to an i.i.d. Bernoulli arrival process Ai (t) of rate λi . Packets are stored in a separate
222
Model
No CSI
Delayed CSI
Full CSI
IID Channels
maxi Qi (t)E[Si (t)]
maxi Qi (t)E[Si (t)]
maxi Qi (t)Si (t)
Markov Channels
*This work*
maxi Qi (t − τ )E[Si (t)|Si (t − τ )] maxi Qi (t)Si (t)
Table 6.1: Throughput optimal policies for different system models. Column corresponds
to a different amount of information at the controller. Rows corresponds to the memory in
the channel. S(t) is the channel state at the current slot, and Q(t) is the queue backlog.
Q1 (t) S (t)
1
R1
Q2 (t) S (t)
2
R2
QM (t) S (t)
M
RM
λ1
λ2
BS
λM
Figure 6-1: Wireless Downlink
queue at the base station, based on the destination node, to await transmission.
Let Qi (t) be the packet backlog corresponding to node i at time t. Due to wireless
interference, the base station is able to transmit to only one node at a time, although
this model can easily be extended to allow for multiple transmissions per slot.
Each node is connected to the base station through an independent time-varying
channel. Let Si (t) ∈ {OFF, ON} be the channel state of the channel at node i at
time t. Assume the channel states evolve over time according to a Markov chain,
shown in Figure 6-2. If a packet for node i is scheduled for transmission, and Si (t) =
ON, then the packet is successfully transmitted, assuming there are packets awaiting
transmission, and that packet departs the system. On the other hand, if the channel
at node i is OFF, then the transmission fails, and the packet remains in the system.
Let PSi (1) and PSi (0) be the steady state probability of channel i being ON or OFF
respectively.
The base station has access to the history of queue lengths for each node i; however, the current channel states are only known by the respective receivers, but not
223
p
1−p
0
1
1−q
q
Figure 6-2: Markov Chain describing the channel state evolution of each independent channel. State 0 corresponds to an OFF channel, while state 1 corresponds to an ON channel.
the base station. Therefore, the base station makes a transmission decision based on
QLI, but not CSI1 . Let Π be the set of transmission policies which do not use CSI.
The primary objective is to schedule transmissions such that the system of queues is
stable. In this work, we characterize the throughput region of the system above, and
propose a throughput optimal scheduling policy using delayed QLI.
6.2
Throughput Region
The throughput region is computed by solving the following linear program (LP).
Maximize:
Subject To:
λi + ≤ αi PSi (1)
M
X
∀i ∈ {1, . . . , M }
(6.1)
αi ≤ 1
i=1
αi ≥ 0
∀i ∈ 1, . . . , M
In the above LP, αi represents the fraction of time the base station schedules node
i for transmission. To maintain stability, the arrival rate to each queue must be less
than the service rate at that queue, which is a function of αi and the statistics of the
1
We assume packet acknowledgements occur at a separate layer, and cannot be used to predict
the channel state.
224
channel. Thus, the throughput region, Λ, is the set of all non-negative arrival rate
vectors λ such that there exists a feasible solution to (6.1) for which ∗ ≥ 0. The
proof that Λ is the throughput region is given below.
Theorem 25 (Throughput Region). For any non-negative arrival rate vector λ, the
system can be stabilized by some policy P ∈ Π if and only if λ ∈ Λ.
Necessity is shown in Lemma 17, and sufficiency is shown in Theorem 26 by
proposing a throughput optimal scheduling policy, and proving that for all λ ∈ Λ,
that policy stabilizes the system.
Lemma 17. Suppose there exists a scheduling policy P ∈ Π that stabilizes the system
without using CSI. There exists an αi such that (6.1) has a solution with ∗ ≥ 0.
Proof. Consider the stabilizing policy P ∈ Π, consisting of control functions αi (t)
which chooses a link to activate at each time. Note that this policy must be independent of CSI. Without loss of generality, let αi (t) be an indicator function equal to 1 if
link i is scheduled for transmission at time t. Under any such scheme, the following
relationship holds between arrivals, departures, and backlogs for each queue:
t
X
Ai (τ ) ≤ Qi (t) +
τ =1
t
X
µi (αi (τ )),
(6.2)
τ =1
where µi is the service rate of the ith queue as a function of the control decisions.
Writing out the expression for µi in terms of the decision variables αi (t) yields
t
X
Ai (τ ) ≤ Qi (t) +
τ =1
t
X
αir (τ )PS (1).
(6.3)
τ =1
Since the arrival and the channel state process are ergodic, and the number of channel
states and queues are finite, there exists a time t1 such that for all t ≥ t1 , the empirical
average arrival rate is within
2
of its expectation.
t
1X
Ai (τ ) ≥ λi −
t τ =1
2
225
(6.4)
The above holds with probability 1 from the strong law of large numbers. Furthermore, since the system is stable under the policy P, [48] shows that there exists a V
such that for an arbitrarily large t,
P
X
M
Qi (t) ≤ V
i=1
1
≥ .
2
Thus, let t be a large time index such that t ≥ t1 and
(6.5)
V
t
≤ 2 . If
PM
i=1
Qi (t) ≤ V ,
the inequality in (6.3) can be rewritten by dividing by t,
t
t
1X
1
1X
Ai (τ ) ≤ V +
αi (τ )PS (1)
t τ =1
t
t τ =1
t
λi −
(6.6)
t
1X
1X
≤
Ai (τ ) ≤ +
αi (τ )PS (1)
2
t τ =1
2 t τ =1
λi − ≤ ᾱi PS (1)
(6.7)
(6.8)
The lower bound in (6.7) follows from (6.4), and equation (6.8) follows from defining
P
P
ᾱi = 1t Tτ=1 αi (τ ). Inequality (6.8) assumes M
i=1 Qi (t) ≤ V , and holds with probability greater than
1
2
by (6.5), implying that there exists a set of stationary control
decisions αi satisfying the necessary constraints such that (6.8) holds for all i. If
there was no such stationary policy, than this inequality would hold with probability
0. Therefore, λ is arbitrarily close to a point in the region Λ, implying the constraints
imposed by Λ are necessary for system stability.
Lemma 17 shows that for all λ ∈ Λ, there exists a stationary policy STAT ∈ Π that
stabilizes the system, by scheduling link i with probability αi . However, the correct
value of αi relies on knowledge of the arrival rates to the system. In the following
section, we develop a scheduling policy based on delayed QLI, that stabilizes the
system without requiring knowledge of the arrival rates.
226
6.3
Dynamic QLI-Based Scheduling Policy
Consider a scheduling and controller placement policy P ∈ Π. Let DiP (t) be the
departure process of queue i, such that DiP (t) = 1 if there is a departure from queue
i at time t under policy P. Consider the evolution of the queues over T time slots
subject to a scheduling policy P.
Qi (t + T ) ≤
Qi (t) −
T −1
X
+
DiP (t)
+
k=0
T −1
X
Ai (t + k)
(6.9)
k=0
Equation (6.9) is an inequality rather than an equality due to the assumption that
the departures are taken from the backlog at the beginning of the T -slot period, and
the arrivals occur at the end of the T slots. Under this assumption, the packets that
arrive within the T -slot period cannot depart within this period. The square of the
queue backlog can be bounded using the inequality in (6.9).
Q2i (t
+ T) ≤
Q2i (t)
+
X
T −1
2 X
2
T −1
P
Di (t + k)
Ai (t + k) +
k=0
X
T −1
+ 2Qi (t)
Ai (t + k) −
k=0
k=0
T
−1
X
DiP (t
+ k)
(6.10)
k=0
The above bound follows from Ai (t) ≥ 0 and Di (t) ≥ 0. Due to the ergodicity
of the finite-state Markov chain controlling the channel state process, for any > 0,
there exists a τQ such that the probability of the channel state conditioned on the
channel state τQ slots in the past is within
2
of the steady state probability of the
Markov chain.
P S(t) = sS(t − τQ ()) − P S(t) = s ≤
2
(6.11)
Note that τQ () is related to the mixing time of the Markov chain. In general, the
Markov chain approaches steady state exponentially fast, at a rate of p + q [21].
Theorem 26 proposes the Delayed Longest Queue (DLQ) scheduling policy, which
stabilizes the network whenever the input rate vector is interior to the capacity region
Λ. Note, this proves sufficiency in Theorem 25.
227
Theorem 26. Consider the Delayed Longest Queue(DLQ) scheduling policy, which
at time t schedules the channel which had the longest queue length at time (t − τQ ()),
where τQ () is defined in (6.11). For any arrival rate λ, and > 0 satisfying λ + 1 ∈
Λ, the DLQ policy stabilizes the system.
The DLQ policy transmits a packet from the longest queue using delayed queue
length information. If fresher QLI is available, it cannot be used by the DLQ policy
to stabilize the system. This is because at time t, the queue with the largest backlog
Qi (t) is also likely to have an OFF channel. Scheduling the longest queue targets
channels that are OFF, and therefore the queue backlogs are not decreased, and the
system grows unstable. On the other hand, if sufficiently delayed QLI is used in the
DLQ policy, then the QLI is independent of the current channel state, because the
state process reaches its steady-state distribution over the τQ slots that the QLI is
delayed. Therefore, the base station schedules queues for which the backlog is long,
without favoring OFF channels.
6.4
Simulation Results
In this section, we simulate a system of four queues, and apply the DLQ policy for
different values of QLI delays (τQ ). We plot the average queue backlog over 100,000
time-slots for different symmetric arrival rates2 . For small arrival rates, the average
queue length remains small. As the arrival rate increases, the backlog slowly increases
until a certain point, after which the backlog greatly increases. This point represents
the boundary of the throughput region, and for arrival rates outside of this region,
the system of queues cannot be stabilized.
For a system of four queues with symmetric channel transition probabilities p = q,
the boundary of the stability region on the symmetric arrival rate line is given by
1
2
1
4
·
= 0.125. Therefore, under the throughput optimal policy, the queue lengths
should remain bounded for arrival rates λ < 0.125.
2
A symmetric arrival rate implies that each node sees the same arrival rate.
228
Figure 6-3 shows the results for transition probabilities p = q = 0.01, and Figure
6-4 shows the results for p = q = 0.1. As shown in Figure 6-3, when the QLI
is insufficiently delayed, the system becomes unstable before the boundary of the
stability region (0.125). For τQ = 1, the system becomes unstable at λ = 0.03. This
represents a 75% reduction in the stability region. As τQ increases, the maximum
arrival rate supportable by the DLQ policy increases. At τQ = 150, it is clear the
system becomes stable for all arrival rates within the stability region.
Similar results are shown in Figure 6-4 for a channel with less memory. In this
case, the attainable throughput of the DLQ policy is less sensitive to the magnitude
of the delays in QLI. The simulation results suggest that τQ = 100 is sufficient to
achieve the full throughput region. The magnitude of the QLI delay required for
throughput optimality is smaller due to the channel state having a smaller mixing
time.
Figure 6-3: Symmetric arrival rate versus average queue backlog for a 4-queue system under
different DLQ policies. Transition probabilities satisfy p = q = 0.01
229
Figure 6-4: Symmetric arrival rate versus average queue backlog for a 4-queue system under
different DLQ policies. Transition probabilities satisfy p = q = 0.1
6.5
Conclusion
In this chapter, we designed a throughput optimal scheduling policy for a system
in which channel states evolve over time according to a Markov process, and QLI
is available to the scheduler but not CSI. We prove the throughput optimal policy
uses delayed QLI rather than current QLI, as in the case where CSI is available
to the transmitter. The required delay on the QLI depends on the mixing time of
the channel state process. In general, the Markov channel state approaches steadystate exponentially at a rate of p + q. As p + q approaches 1, the Markov process
approaches an IID process, and current QLI can be used. However, using further
delayed QLI doesn’t affect the overall throughput region. The drawback of using
delayed is increased packet delays. Therefore, if no CSI is available to the base
station, the optimal policy must trade off between throughput and delay.
230
6.6
Appendix
6.6.1
Proof of Theorem 26
Theorem 26: Consider the Delayed Longest Queue (DLQ) scheduling policy, which at
time t schedules the channel which had the longest queue length at time (t − τQ ()),
where τQ () is defined in (6.11). For any arrival rate λ, and > 0 satisfying λ+1 ∈ Λ,
the DLQ policy stabilizes the system.
Proof of Theorem 26. Let τQ = τQ (), where the dependence on is clear. Let Y(t)
be the history of queue-lengths in the system over up to time t.
Y(t) = Q(0), . . . , Q(t)
(6.12)
The vector of delayed QLI forms a Markov Chain. Define the following quadratic
Lyapunov function:
M
L(Q(t)) =
1X 2
Q (t).
2 i=1 i
(6.13)
The T -step Lyapunov drift is computed as
∆T (Y(t)) = E L(Q(t + T )) − L(Q(t))Y(t)
(6.14)
We show that under the DLQ policy, the T -step Lyapunov drift is negative for
large backlogs, implying the stability of the system under the DLQ policy for all arrival
rates within Λ, which follows from the Foster-Lyapunov criteria [49]. We bound the
Lyapunov drift by combining (6.10), (6.13) and (6.14), and showing for large queue
lengths, this upper bound is negative. Let Di (t) = DiDLQ (t) be the departure process
under the DLQ policy.
X
T −1
2
T −1
2
M
1 X
1 X
Ai (t + k) +
Di (t + k)
∆T (Y(t)) ≤ E
2
2
i=1
k=0
k=0
231
+ Qi (t)
X
T −1
Ai (t + k) −
k=0
X
T −1
X
M
≤B+E
Qi (t)
i=1
T −1
X
Di (t + k) Y(t)
k=0
T −1
X
Ai (t + k) −
k=0
k=0
Di (t + k) Y(t)
(6.15)
(6.16)
where B is a finite constant, which exists due to the boundedness of the second
moment of the arrival process.
The difference between queue lengths at any two times t and s is bounded using
the following inequality:
Qi (t) − Qi (s) ≤ |t − s|,
(6.17)
which holds assuming that an arrival occurs in each slot, and no departures occur, or
vice versa. This inequality establishes a relationship between current queue lengths
and delayed queue lengths.
Qi (t) ≤ Qi (t + k − τQ ) + |k − τQ |
(6.18)
Qi (t) ≥ Qi (t + k − τQ ) − |k − τQ |
(6.19)
The inequalities in (6.18) and (6.19) are used in (6.16) to upper bound the Lyapunov
drift in terms of the delayed QLI for each slot, Qi (t + k − τQ ).
X
T −1 X
M
T −1 X
M
X
∆T (Y(t)) ≤ B + E
Qi (t)Ai (t + k) −
Qi (t)Di (t + k)Y(t)
k=0 i=1
(6.20)
k=0 i=1
X
T −1 X
M
≤B+E
(Qi (t + k − τQ ) + |k − τQ |)Ai (t + k)
k=0 i=1
T −1 X
M
X
−
k=0 i=1
=B+
T −1
X
k=0
(Qi (t + k − τQ ) − |k − τQ |)Di (t + k)Y(t)
(6.21)
X
M
|k − τQ |E
(Ai (t + k) + Di (t + k))Y(t)
i=1
X
T −1 X
M
T −1 X
M
X
+E
Qi (t + k − τQ )Ai (t + k) −
Qi (t + k − τQ )Di (t + k)Y(t)
k=0 i=1
k=0 i=1
(6.22)
232
≤ B + 2M
T −1
X
k=0
X
T −1 X
M
|k − τQ | + E
Qi (t + k − τQ ) Ai (t + k) − Di (t + k) Y(t)
k=0 i=1
(6.23)
X
T −1 X
M
≤ B + 2M T + E
Qi (t + k − τQ ) Ai (t + k) − Di (t + k) Y(t)
2
k=0 i=1
(6.24)
X
T −1 X
M
0
≤B +E
Qi (t + k − τQ ) λi − Di (t + k) Y(t)
(6.25)
k=0 i=1
Equation (6.23) follows from upper bounding the per-slot arrival and departure
rate each by 1. Equation (6.24) follows from the fact that T ≥ τQ . Equation (6.25)
follows by defining B 0 = B + 2M T 2 , and using E[Ai (t + k)] = λi . Now consider
the last term on the right hand side of (6.25). This expectation can be rewritten by
conditioning on the delayed QLI at the current slot t + k and using the law of iterated
expectations, given in (5.60).
X
M
T −1 X
Qi (t + k − τQ ) λi − Di (t + k) Y(t)
E
k=0 i=1
X
T −1
=E
k=0
X
M
E
Qi (t + k − τQ ) λi − Di (t + k) Q(t + k − τQ ) Y(t) (6.26)
i=1
To bound (6.26), we require the channel state at slot t + k to be independent
from Y(t), which only holds in slots where k is sufficiently large. Thus, we break the
summation in (6.25) into two parts: a smaller number of slots for which k is small,
and a larger number of slots for which k is large. A trivially conservative bound is
used for k < τQ , but the frame size is chosen to ensure the first τQ slots is a small
fraction of the overall T slots.
T −1 X
M
X
∆T (Y(t)) ≤ B +
E
Qi (t + k − τQ ) λi − Di (t + k) Y(t)
0
k=0
i=1
233
τQ −1
M
X X
≤B +
E
Qi (t + k − τQ ) λi − Di (t + k) Y(t)
0
+
k=0
T
−1
X
k=τQ
i=1
X
M
E
Qi (t + k − τQ ) λi − Di (t + k) Y(t)
(6.27)
i=1
For values of k < τQ , the upper bound follows by trivially upper bounding the arrival
rate by 1 and lower bounding the departures by 0 in each slot.
τQ −1
M
X X
E
Qi (t + k − τQ ) λi − Di (t + k) Y(t)
k=0
≤
≤
i=1
τQ −1 M
XX
k=0 i=1
τQ −1 M
XX
E Qi (t + k − τQ )Y(t)
Qi (t − τQ ) +
k=0 i=1
M
X
≤ τQ
i=1
τQ −1 M
XX
k
(6.28)
(6.29)
k=0 i=1
1
Qi (t − τQ ) + (τQ )2 M
2
(6.30)
where (6.29) follows from (6.17).
Now consider slots for which k ≥ τQ . Let φi be a binary indicator variable denoting
whether queue i is scheduled under the DLQ policy as a function of the delayed QLI.
For these time-slots, we evaluate the expected departure rate, and compare it to the
departure rate of the stationary policy in Lemma 17, which we know stabilizes the
system. The interior expectation in (6.26) is expanded as
X
M
E
Qi (t + k − τQ ) λi − Di (t + k) Q(t + k − τQ ), Y(t)
=
=
i=1
M
X
i=1
M
X
Qi (t + k − τQ ) λi − E Di (t + k) Q(t + k − τQ ), Y(t)
(6.31)
(6.32)
Qi (t + k − τQ )λi
i=1
−
M
X
Qi (t + k − τQ )E φi (Q(t + k − τQ ))Si (t + k)Q(t + k − τQ ), Y(t) (6.33)
i=1
234
=
M
X
Qi (t + k − τQ )λi
i=1
−
M
X
Qi (t + k − τQ )φi (Q(t + k − τQ ))E Si (t + k)Q(t + k − τQ ), Y(t) .
i=1
(6.34)
Equation (6.34) follows since the scheduling under DLQ is completely determined by
the delayed QLI.
Note that the throughput optimal policy minimizes the expression in (6.34); however, the expectation cannot be computed because it requires knowledge of the conditional distribution of the channel state sequence given QLI, which requires knowledge
of the arrival rates to compute. However, when QLI is sufficiently delayed, the bound
in (6.11) can be used to remove the conditioning on QLI as follows.
P Si (t + k) = sQ(t + k − τQ ), Y(t)
X
=
P Si (t + k − τQ ) = s0 Q(t + k − τQ ), Y(t)
s0 ∈S
· P Si (t + k) = sSi (t + k − τQ ) = s0 , Q(t + k − τQ ), Y(t)
(6.35)
X
=
P Si (t + k − τQ ) = s0 Q(t + k − τQ ), Y(t) P Si (t + k) = sSi (t + k − τQ ) = s0
s0 ∈S
(6.36)
≥
X
s0 ∈S
0
P Si (t + k − τQ ) = s Q(t + k − τQ ), Y(t) P Si (t + k) = s −
2
= P Si (t + k) = s −
2
(6.37)
(6.38)
Equation (6.35) follows from the law of total probability. Due to the Markov property
of the channel state, the state at time t is conditionally independent of Q(t−τQ ) given
S(t − τQ ), leading to equation (6.36). Equation (6.37) holds from the definition of τQ
in (6.11), which implies the conditional state distribution is within
2
of the stationary
distribution.
The expression in (6.34) can now be bounded in terms of an unconditional expec235
tation.
M
X
Qi (t + k − τQ )λi
i=1
−
M
X
Qi (t + k − τQ )φi (Q(t + k − τQ ))E Si (t + k)Q(t + k − τQ ), Y(t)
i=1
=
M
X
Qi (t + k − τQ )λi
i=1
−
M
X
Qi (t + k − τQ )φi (Q(t + k − τQ ))P Si (t + k) = 1Q(t + k − τQ ), Y(t)
i=1
(6.39)
≤
M
X
Qi (t + k − τQ )λi
i=1
−
M
X
i=1
M
X
Qi (t + k − τQ )
Qi (t + k − τQ )φi (Q(t + k − τQ ))P Si (t + k) = 1 +
2 i=1
(6.40)
Equation (6.39) follows from the distribution of channel state. The inequality in
(6.40) follows from applying (6.38) and upper bounding φi (Q) ≤ 1. Under the DLQ
policy, the total service rate is simplified as
M
X
φi Q(t + k − τQ ) Qi (t + k − τQ )PS (1) = PS (1) max Qi (t + k − τQ ).
i
i=1
(6.41)
Combining equation (6.41) with equation (6.40) yields
M
X
i=1
M
X
Qi (t + k − τQ )λi − PS (1) max Qi (t + k − τQ ) +
Qi (t + k − τQ )
i
2 i=1
(6.42)
Now, we reintroduce the stationary policy of Lemma 17 to complete the bound.
Recall that for any λ ∈ Λ, there exists a stationary policy which schedules node i for
transmission with probability αi , and satisfies
λi + ≤ αi PS (1)
∀i ∈ {1, . . . , M }.
236
(6.43)
Note that the in the theorem statement and in (6.43) are designed to be equal.
The expression in (6.42) is bounded by adding and subtracting identical terms corresponding to the stationary policy.
M
X
i=1
M
X
Qi (t + k − τQ )λi − PS (1) max Qi (t + k − τQ ) +
Qi (t + k − τQ )
i
2 i=1
+
M
X
Qi (t + k − τQ )αi PS (1) −
i=1
=
M
X
M
X
i=1
− PS (1) max Qi (t + k − τQ ) +
i
≤ −
(6.44)
i=1
Qi (t + k − τQ )(λi − αi PS (1)) +
M
X
Qi (t + k − τQ )αi PS (1)
Qi (t + k − τQ ) +
M
X
i=1
2
M
X
Qi (t + k − τQ )αi PS (1)
i=1
M
X
Qi (t + k − τQ )
(6.45)
i=1
Qi (t + k − τQ )αi PS (1) − PS (1) max Qi (t + k − τQ )
i
i=1
M
X
Qi (t + k − τQ )
+
2 i=1
(6.46)
M
X
≤−
Qi (t + k − τQ )
2 i=1
(6.47)
Equation (6.46) follows from (6.43), and equation (6.47) follows from the fact that
P
since i αi ≤ 1, the weighted sum of queue lengths is maximized by placing all the
weight at the largest queue length.
To conclude the proof, the bound in (6.17) is used to revert the QLI at time
t + k − τQ to a queue length at time t − τQ , which is known through knowledge of
Y(t).
M
−
M
X
X
Qi (t + k − τQ ) ≤ −
Qi (t − τQ ) + M k
2 i=1
2 i=1
2
(6.48)
The upper bound in (6.48) for slots k ≥ τQ is combined with the bound in (6.30)
for k < τQ to bound the drift in (6.27).
∆T (Y(t)) ≤ B 0 + τQ
M
X
i=1
1
Qi (t − τQ ) + (τQ )2 M
2
237
M
T −1
X
X
+
Qi (t − τQ ) + M k Y(t)
E −
2 i=1
2
k=τ
(6.49)
Q
0
≤ B + τQ
M
X
i=1
1
Qi (t − τQ ) + τQ2 M
2
M
T −1
X
X
− (T − τQ )
Mk
Qi (t − τQ ) +
2 i=1
2
k=τ
(6.50)
Q
0
≤ B + τQ
M
X
i=1
M
1
X
Qi (t − τQ ) + τQ2 M − (T − τQ )
Qi (t − τQ )
2
2 i=1
+ M (T 2 − τQ2 )
4
1
≤ B 0 + τQ2 M (1 − ) + M T 2
2
2
4
M
M
X
X
+ τQ
Qi (t − τQ ) − (T − τQ )
Qi (t − τQ )
2 i=1
i=1
(6.51)
(6.52)
Thus, for any ξ > 0, and T satisfying
T ≥
2(1 + 2 )τQ + 2ξ
(6.53)
and positive constant K satisfying
1
K = B 0 + τQ2 M (1 − ) + M T 2 ,
2
2
4
it follows that
∆T (Y(t)) ≤ K − ξ
M
X
Qi (t − τQ )
(6.54)
(6.55)
i=1
Thus, for large enough queue backlogs, the T -slot Lyapunov drift is negative, and
from [48] it follows that the overall system is stable.
238
Chapter
7
Concluding Remarks
In this thesis, we have studied the tradeoff between the amount and accuracy of
the available control information and the achievable throughput for opportunistic
scheduling. In wireless networks, the memory in the channel state process can be
used to aid in scheduling, reducing the frequency that channel state information
(CSI) needs to be acquired. This is essential to deal with the increasing overheads
of future wireless networks. We addressed three fundamental questions pertaining to
the information overheads in wireless scheduling: What is the minimum amount of
information required, what is the best information to learn, and how do we optimally
control the network with limited or inaccurate information?
In Chapter 2, we analyzed channel probing as a means of acquiring network state
information, and we developed optimal probing strategies by determining which channels to probe and how often to probe these channels. We showed that infrequent
channel probing can be used to achieve high throughput in a multichannel system.
In contrast to the work in [2, 71] that established the optimality of the myopic probe
best policy, we showed that for a slightly modified model, these results no longer
hold. Under a two channel system, we proved that probing either channel results
in the same throughput, and under an infinite channel system, we proved that a
simple alternative, the probe second-best policy, outperforms the probe best policy
in terms of average throughput. We proved the optimality of the probe second-best
policy in three channel systems, and conjecture that probing the second-best channel
239
is the optimal decision in a general multi-channel system. Proving this conjecture is
interesting, and remains an open problem.
Next, we developed a fundamental lower bound on the rate that CSI needs to
be conveyed to the transmitter in order to achieve a throughput requirement. We
modeled this problem as a causal rate-distortion minimization with a unique distortion measure that quantifies the throughput achievable as a function of the CSI at
the transmitter. For the case of two channels, we computed a closed-form expression
for the causal rate distortion function, and proposed a practical encoding algorithm
to achieve the required throughput with limited CSI overhead. Analytic results for
larger systems are an are of future research.
In the second half of the thesis, we analyzed the scheduling problem over a wireless
network. We developed a new model relating CSI delays to distance, reflecting the
effect of transmission and propagation delays in conveying CSI across the network.
Since centralized approaches are constrained to using this delayed information, we
prove that in large networks or when the channels have little memory, distributed
scheduling can outperform the optimal centralized scheduler. Additionally, we characterized the expected throughput of centralized and distributed scheduling over tree
and clique topologies.
Lastly, we characterized the effect that the controller location has on the ability to
schedule transmissions over network, amidst delayed CSI. We showed that dynamically relocating the controller based on queue length information balances throughput
across the network, and provides system stability. We proposed a throughput optimal
joint controller placement and scheduling policy which stabilizes the system for any
arrival rates within the throughput region. This policy used delayed QLI to relocate
the controller. Interestingly, when the controller placement cannot depend on CSI,
significantly delayed QLI is required to decouple the QLI from the CSI. We investigated this property in general in Chapter 6, and proposed a throughput optimal
scheduling policy in a system where QLI is available, but not CSI.
240
Bibliography
[1] A. A. Abouzeid and N. Bisnik. Geographic protocol information and capacity
deficit in mobile wireless ad hoc networks. Information Theory, IEEE Trans. on,
57(8):5133–5150, 2011.
[2] S. Ahmad, M. Liu, T. Javidi, Q. Zhao, and B. Krishnamachari. Optimality of
myopic sensing in multichannel opportunistic access. Information Theory, IEEE
Transactions on, 2009.
[3] J. Andrews, S. Shakkottai, R. Heath, N. Jindal, M. Haenggi, R. Berry, D. Guo,
M. Neely, S. Weber, S. Jafar, et al. Rethinking information theory for mobile ad
hoc networks. Communications Magazine, IEEE, 46(12):94–101, 2008.
[4] J. G. Andrews, A. Ghosh, and R. Muhamed. Fundamentals of WiMAX: understanding broadband wireless networking. Pearson Education, 2007.
[5] M. Andrews, K. Kumaran, K. Ramanan, A. Stolyar, R. Vijayakumar, and
P. Whiting. Scheduling in a queuing system with asynchronously varying service
rates. Probability in the Engineering and Informational Sciences, 18(02):191–217,
2004.
[6] T. Berger. Rate-Distortion Theory. Wiley Online Library, 1971.
[7] T. Berger. Explicit bounds to r (d) for a binary symmetric markov source.
Information Theory, IEEE Transactions on, 23(1):52–59, 1977.
[8] D. P. Bertsekas. Introduction to Probability: Dimitri P. Bertsekas and John N.
Tsitsiklis. Athena Scientific, 2002.
[9] V. Borkar, S. Mitter, A. Sahai, and S. Tatikonda. Sequential source coding: an
optimization viewpoint. In CDC-ECC’05, pages 1035–1042. IEEE, 2005.
[10] G. D. Celik, L. B. Le, and E. Modiano. Scheduling in parallel queues with
randomly varying connectivity and switchover delay. In INFOCOM, 2011 Proceedings IEEE, pages 316–320. IEEE, 2011.
241
[11] N. Chang and M. Liu. Optimal channel probing and transmission scheduling for
opportunistic spectrum access. In International Conference on Mobile Computing
and Networking: Proceedings of the 13 th annual ACM international conference
on Mobile computing and networking, 2007.
[12] P. Chaporkar and A. Proutiere. Optimal joint probing and transmission strategy
for maximizing throughput in wireless systems. Selected Areas in Communications, IEEE Journal on, 2008.
[13] M. Chiang and S. Boyd. Geometric programming duals of channel capacity
and rate distortion. Information Theory, IEEE Transactions on, 50(2):245–258,
2004.
[14] Cisco. Cisco visual networking index index: Forecast and methodology 20132018, 2014.
[15] T. M. Cover and J. A. Thomas. Elements of information theory. John Wiley &
Sons, 2012.
[16] M. S. Daskin. Network and discrete location: models, algorithms, and applications. John Wiley & Sons, 2011.
[17] A. Dimakis and J. Walrand. Sufficient conditions for stability of longest-queuefirst scheduling: Second-order properties using fluid limits. Advances in Applied
probability, pages 505–521, 2006.
[18] B. P. Dunn and J. N. Laneman. Basic limits on protocol information in slotted
communication networks. In ISIT 2008, pages 2302–2306. IEEE, 2008.
[19] R. Gallager. Basic limits on protocol information in data communication networks. IEEE Transactions on Information Theory, 1976.
[20] R. Gallager and R. Gallager. Discrete stochastic processes. Kluwer Academic
Publishers, 1996.
[21] R. G. Gallager. Stochastic processes: theory for applications. Cambridge University Press, 2013.
[22] E. N. Gilbert. Capacity of a burst-noise channel. Bell system technical journal,
39(5):1253–1265, 1960.
[23] A. Gopalan, C. Caramanis, and S. Shakkottai. On wireless scheduling with
partial channel-state information. In Proc. Ann. Allerton Conf. Communication,
Control and Computing, 2007.
[24] A. Gorbunov and M. S. Pinsker. Nonanticipatory and prognostic epsilon entropies and message generation rates. Problemy Peredachi Informatsii, 9(3):12–
21, 1973.
242
[25] R. Gray. Information rates of autoregressive processes. Information Theory,
IEEE Trans. on, 16(4):412–421, 1970.
[26] J. L. Gross and J. Yellen. Graph theory and its applications. CRC press, 2005.
[27] S. Guha, K. Munagala, and S. Sarkar. Jointly optimal transmission and probing strategies for multichannel wireless systems. In Information Sciences and
Systems, 2006 40th Annual Conference on. IEEE, 2006.
[28] S. Guha, K. Munagala, and S. Sarkar. Optimizing transmission rate in wireless
channels using adaptive probes. In ACM SIGMETRICS Performance Evaluation
Review. ACM, 2006.
[29] B. Hajek and G. Sasaki. Link scheduling in polynomial time. Information Theory,
IEEE Transactions on, 34(5):910–917, 1988.
[30] J. Hong and V. O. Li. Impact of information on network performance-an
information-theoretic perspective. In GLOBECOM 2009., pages 1–6. IEEE,
2009.
[31] K. Jagannathan, S. Mannor, I. Menache, and E. Modiano. A state action frequency approach to throughput maximization over uncertain wireless channels.
In INFOCOM, 2011 Proceedings IEEE. IEEE, 2011.
[32] K. Jagannathan, S. Mannor, I. Menache, and E. Modiano. A state action frequency approach to throughput maximization over uncertain wireless channels.
Internet Mathematics, 9(2-3):136–160, 2013.
[33] K. Jagannathan and E. Modiano. The impact of queue length information on
buffer overflow in parallel queues. In Proceedings of the 47th annual Allerton
conference on Communication, control, and computing, pages 1103–1110. IEEE
Press, 2009.
[34] S. Jalali and T. Weissman. New bounds on the rate-distortion function of a
binary markov source. In ISIT 2007., pages 571–575. IEEE, 2007.
[35] L. Jiang and J. Walrand. A distributed csma algorithm for throughput and utility
maximization in wireless networks. IEEE/ACM Transactions on Networking
(TON), 18(3):960–972, 2010.
[36] M. Johnston and E. Modiano. Optimal channel probing in communication systems: The two-channel case. In Global Communications (GLOBECOM), 2013
IEEE International Symposium on. IEEE, 2013.
[37] M. Johnston, E. Modiano, and I. Keslassy. Channel probing in communication
systems: Myopic policies are not always optimal. In Information Theory Proceedings (ISIT), 2013 IEEE International Symposium on, pages 1934–1938. IEEE,
2013.
243
[38] M. Johnston, E. Modiano, and Y. Polyanskiy. Opportunistic scheduling with
limited channel state information: A rate distortion approach. In Proc. IEEE
ISIT, 2014.
[39] C. Joo, X. Lin, and N. B. Shroff. Understanding the capacity region of the
greedy maximal scheduling algorithm in multihop wireless networks. IEEE/ACM
Transactions on Networking (TON), 17(4):1132–1145, 2009.
[40] C. Joo and N. B. Shroff. Performance of random access scheduling schemes in
multi-hop wireless networks. IEEE/ACM Transactions on Networking (TON),
17(5):1481–1493, 2009.
[41] K. Kar, X. Luo, and S. Sarkar. Throughput-optimal scheduling in multichannel
access point networks under infrequent channel measurements. Wireless Communications, IEEE Transactions on, 2008.
[42] P. Karn. Maca-a new channel access method for packet radio. In ARRL/CRRL
Amateur radio 9th computer networking conference, volume 140, pages 134–140,
1990.
[43] Y. Y. Kim and S.-q. Li. Capturing important statistics of a fading/shadowing
channel for network performance analysis. Selected Areas in Communications,
IEEE Journal on, 17(5):888–901, 1999.
[44] L. B. Le, E. Modiano, C. Joo, and N. B. Shroff. Longest-queue-first scheduling
under sinr interference model. In Proceedings of the eleventh ACM international
symposium on Mobile ad hoc networking and computing, pages 41–50. ACM,
2010.
[45] C.-p. Li and M. J. Neely. Exploiting channel memory for multiuser wireless
scheduling without channel measurement: Capacity regions and algorithms. Performance Evaluation, 68(8):631–657, 2011.
[46] X. Lin and N. B. Shroff. The impact of imperfect scheduling on cross-layer rate
control in wireless networks. In INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE,
volume 3, pages 1804–1814. IEEE, 2005.
[47] E. Modiano, D. Shah, and G. Zussman. Maximizing throughput in wireless
networks via gossiping. In ACM SIGMETRICS Performance Evaluation Review,
volume 34, pages 27–38. ACM, 2006.
[48] M. Neely, E. Modiano, and C. Rohrs. Dynamic power allocation and routing for
time-varying wireless networks. IEEE Journal on Selected Areas in Communications, 23(1):89–103, 2005.
[49] M. J. Neely. Dynamic power allocation and routing for satellite and wireless
networks with time varying channels. PhD thesis, Massachusetts Institute of
Technology, 2003.
244
[50] M. J. Neely. Stochastic network optimization with application to communication
and queueing systems. Synthesis Lectures on Communication Networks, 3(1):1–
211, 2010.
[51] D. Neuhoff and R. Gilbert. Causal source codes. Information Theory, IEEE
Transactions on, 28(5):701–713, 1982.
[52] M. Newman. Networks: an introduction. Oxford University Press, 2010.
[53] J. Ni, B. Tan, and R. Srikant. Q-csma: Queue-length-based csma/ca algorithms
for achieving maximum throughput and low delay in wireless networks. Networking, IEEE/ACM Transactions on, 20(3):825–836, 2012.
[54] A. Pantelidou, A. Ephremides, and A. L. Tits. Joint scheduling and routing for
ad-hoc networks under channel state uncertainty. In Modeling and Optimization
in Mobile, Ad Hoc and Wireless Networks and Workshops, 2007. WiOpt 2007.
5th International Symposium on, pages 1–8. IEEE, 2007.
[55] C. H. Papadimitriou and K. Steiglitz. Combinatorial optimization: algorithms
and complexity. Courier Dover Publications, 1998.
[56] A. Proutiere, Y. Yi, and M. Chiang. Throughput of random access without
message passing. In CISS, pages 509–514, 2008.
[57] S. Sanghavi, L. Bui, and R. Srikant. Distributed link scheduling with constant
overhead. In ACM SIGMETRICS Performance Evaluation Review, volume 35,
pages 313–324. ACM, 2007.
[58] S. Sarkar and S. Ray. Arbitrary throughput versus complexity tradeoffs in wireless networks using graph partitioning. Automatic Control, IEEE Transactions
on, 53(10):2307–2323, 2008.
[59] B. T. SCHEME. Lte: the evolution of mobile broadband. IEEE Communications
Magazine, page 45, 2009.
[60] P. A. Stavrou and C. D. Charalambous. Variational equalities of directed information and applications. arXiv preprint arXiv:1301.6520, 2013.
[61] L. Tassiulas and A. Ephremides. Stability properties of constrained queueing
systems and scheduling policies for maximum throughput in multihop radio networks. Automatic Control, IEEE Transactions on, 37(12):1936–1948, 1992.
[62] L. Tassiulas and A. Ephremides. Dynamic server allocation to parallel queues
with randomly varying connectivity. Information Theory, IEEE Transactions
on, 39(2):466–478, 1993.
[63] S. Tatikonda and S. Mitter. Control under communication constraints. Automatic Control, IEEE Transactions on, 49(7):1056–1068, 2004.
245
[64] M. B. Teitz and P. Bart. Heuristic methods for estimating the generalized vertex
median of a weighted graph. Operations research, 16(5):955–961, 1968.
[65] D. Tse and P. Viswanath. Fundamentals of wireless communication. Cambridge
university press, 2005.
[66] J. Walrand and P. Varaiya. Optimal causal coding-decoding problems. Information Theory, IEEE Transactions on, 29(6):814–820, 1983.
[67] H. S. Wang and N. Moayeri. Finite-state markov channel-a useful model for
radio communication channels. Vehicular Technology, IEEE Transactions on,
44(1):163–171, 1995.
[68] H. Witsenhausen. On the structure of real-time source coders. Bell Syst. Tech.
J, 58(6):1437–1451, 1979.
[69] L. Ying and S. Shakkottai. On throughput optimality with delayed network-state
information. Information Theory, IEEE Transactions on, 2011.
[70] L. Ying and S. Shakkottai. Scheduling in mobile ad hoc networks with topology and channel-state uncertainty. Automatic Control, IEEE Transactions on,
57(10):2504–2517, 2012.
[71] Q. Zhao, B. Krishnamachari, and K. Liu. On myopic sensing for multi-channel
opportunistic access: Structure, optimality, and performance. Wireless Communications, IEEE Transactions on, 2008.
[72] M. Zorzi, R. R. Rao, and L. B. Milstein. On the accuracy of a first-order markov
model for data transmission on fading channels. In Proc. IEEE ICUPC95, pages
211–215. Citeseer, 1995.
[73] G. Zussman, A. Brzezinski, and E. Modiano. Multihop local pooling for distributed throughput maximization in wireless networks. In INFOCOM 2008.
The 27th Conference on Computer Communications. IEEE. IEEE, 2008.
246
Download