The Role of Control Information in Wireless Link Scheduling by Matthew R. Johnston B.S. (EECS), University of California, Berkeley (2008) S.M. (EECS), Massachusetts Institute of Technology (2010) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2015 c Massachusetts Institute of Technology 2015. All rights reserved. Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Department of Electrical Engineering and Computer Science December 18, 2014 Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eytan H. Modiano Professor Thesis Supervisor Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Leslie A. Kolodziejski Chairman, Department Committee on Graduate Theses 2 The Role of Control Information in Wireless Link Scheduling by Matthew R. Johnston Submitted to the Department of Electrical Engineering and Computer Science on December 18, 2014, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract In wireless networks, transmissions must be scheduled to opportunistically exploit the time-varying capacity of the wireless channels to achieve maximum throughput. These opportunistic policies require global knowledge of the current network state to schedule transmissions efficiently; however, providing a controller with complete channel state information (CSI) requires significant bandwidth. In this thesis, we investigate the impact of control information on the ability to effectively schedule transmissions. In particular, we study the tradeoff between the availability and accuracy of CSI at the scheduler and the attainable throughput. Moreover, we investigate strategies for controlling the network with limited CSI. In the first half of the thesis, we consider a multi-channel communication system in which the transmitter chooses one of M channels over which to transmit. We model the channel state using an ON/OFF Markov process. First, we consider channel probing policies, in which the transmitter probes a channel to learn its state, and uses the CSI obtained from channel probes to make a scheduling decision. We investigate the optimal channel probing strategies and characterize the tradeoff between probing frequency and throughput. Furthermore, we characterize a fundamental limit on the rate at which CSI must be conveyed to the transmitter in order to meet a constraint on expected throughput. In particular, we develop a novel formulation of the opportunistic scheduling problem as a causal rate distortion optimization of a Markov source. The second half of this thesis considers scheduling policies under delayed CSI, resulting from the transmission and propagation delays inherent in conveying CSI across the network. By accounting for these delays as they relate to the network topology, we revisit the comparison between centralized and distributed scheduling, showing that there exist conditions under which distributed scheduling outperforms the optimal centralized policy. Additionally, we illustrate that the location of a centralized controller impacts the achievable throughput. We propose a dynamic controller placement framework, in which the controller is repositioned using delayed queue length information (QLI). We characterize the throughput region under all such policies, and propose a throughput-optimal joint controller placement and scheduling policy using 3 delayed CSI and QLI. Thesis Supervisor: Eytan H. Modiano Title: Professor 4 Acknowledgments This thesis represents the culmination of a lot of work and effort, the majority of which would not have been possible if not for the advice, guidance and support of many people. First and foremost, I would like to extend my sincerest gratitude to my advisor, Eytan Modiano. From day one, six and a half years ago, Eytan has helped guide my research, and his guidance has shaped me into the researcher and person that I am today. I cannot thank him enough. I wish to thank Professor Yury Polyanskiy, who collaborated with me to develop a significant portion of this thesis. Yury’s endless enthusiasm was refreshing, and drove me to get the most out of my research. I would also like to thank Professor John Tsitsiklis, who’s teaching helped laid the foundation for my thesis, and who provided essential guidance for my research at key points throughout this process. Additionally, much of this research was a product of collaborations with several people. I would like to thank my coauthors Prof. Hyang-Won Lee and Prof. Isaac Kesslasy, with whom I had many technical discussions that helped progress my research. I am very thankful for the people at CNRG who have shared an office with me for one point or another during my time at MIT. A special thanks to Greg Kuperman, Sebastian Neumayer, Guner Celik, Marzieh Parandehgheibi, and Georgios Paschos, who have spent many hours staring at a white board with me, working out problem formulations and proofs. A huge thanks to them and the other members of CNRG for providing a great environment to come to every day. My time at MIT was so enjoyable because of the friends I’ve made along the way. Thank you to all of them, who have supported me through the toughest times, and helped me celebrate the best. Even though grad school apparently does come to an end, I know the friendships I made here will last forever. None of this would have been possible at all if not for the support and love I received along the way. I am eternally grateful for the love of my family: my parents, Leslie and Tom, and my sisters Megan and Rachel. Each one of them has shaped me into the person I am, and has pushed me and supported me through my entire 5 academic career. One acknowledgment section could never capture how much you’ve meant to me. Lastly, thank you to Marta Flory. Your never-ending support and love helped me get through the toughest parts of this journey. Having you to talk to at the end of every single day has been so important to me, I couldn’t imagine doing this without you. This work was supported through the National Science Foundation, through grants CNS-0915988 and CNS-1217048, through the Army Research Office Multidisciplinary University Research Initiative through grant W911NF-08-1-0238, and the Office of Naval Research through grant N00014-12-1-0064. 6 Contents 1 Introduction 1.1 19 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.1.1 Network Control . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.1.2 Channel Probing . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.1.3 Protocol Information . . . . . . . . . . . . . . . . . . . . . . . 22 1.1.4 Scheduling with Delayed CSI . . . . . . . . . . . . . . . . . . 23 Our Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.2.1 Channel Probing . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.2.2 Fundamental Limit on CSI Overhead . . . . . . . . . . . . . . 25 1.2.3 Delayed Channel State Information . . . . . . . . . . . . . . . 26 1.2.4 Throughput Optimal Scheduling with Hidden CSI . . . . . . . 27 1.3 Modeling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 1.2 2 Channel Probing in Opportunistic Communication Systems 2.1 2.2 31 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.1.2 Optimal Scheduling . . . . . . . . . . . . . . . . . . . . . . . . 35 Two-Channel System . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.2.1 Heterogeneous Channels . . . . . . . . . . . . . . . . . . . . . 41 2.2.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 43 7 2.3 Optimal Channel Probing over Finitely Many Channels . . . . . . . . 45 2.3.1 Three Channel System . . . . . . . . . . . . . . . . . . . . . . 46 2.3.2 Arbitrary Number of Channels . . . . . . . . . . . . . . . . . 48 2.3.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 48 Infinite-Channel System . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.4.1 Probe-Best Policy . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.4.2 Probe Second-Best Policy . . . . . . . . . . . . . . . . . . . . 52 2.4.3 Round Robin Policy . . . . . . . . . . . . . . . . . . . . . . . 57 2.4.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 59 Dynamic Optimization of Probing Intervals . . . . . . . . . . . . . . . 59 2.5.1 Two-Channel System . . . . . . . . . . . . . . . . . . . . . . . 60 2.5.2 State Action Frequency Formulation . . . . . . . . . . . . . . 66 2.5.3 Infinite-Channel System . . . . . . . . . . . . . . . . . . . . . 70 2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 2.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 2.7.1 Proof of Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . 78 2.7.2 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . 79 2.7.3 Proof of Theorem 3 . . . . . . . . . . . . . . . . . . . . . . . . 80 2.7.4 Proof of Lemmas 2 and 3 . . . . . . . . . . . . . . . . . . . . . 84 2.4 2.5 3 Opportunistic Scheduling with Limited Channel State Information: A Rate Distortion Approach 87 3.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3.1.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 91 3.1.2 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Rate Distortion Lower Bound . . . . . . . . . . . . . . . . . . . . . . 93 3.2.1 Traditional Rate Distortion . . . . . . . . . . . . . . . . . . . 93 3.2.2 Causal Rate Distortion for Opportunistic Scheduling . . . . . 94 3.2.3 Analytical Solution for Two-Channel System . . . . . . . . . . 96 Heuristic Upper Bound . . . . . . . . . . . . . . . . . . . . . . . . . . 97 3.2 3.3 8 3.3.1 Minimum Distortion Encoding Algorithm . . . . . . . . . . . . 98 3.3.2 Threshold-based Encoding Algorithm . . . . . . . . . . . . . . 99 3.4 Causal Rate Distortion Gap . . . . . . . . . . . . . . . . . . . . . . . 101 3.5 Application to Channel Probing . . . . . . . . . . . . . . . . . . . . . 104 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 3.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 3.7.1 Proof of Theorem 14 . . . . . . . . . . . . . . . . . . . . . . . 108 3.7.2 Proof of Theorem 15 . . . . . . . . . . . . . . . . . . . . . . . 111 3.7.3 Proof of Proposition 4 . . . . . . . . . . . . . . . . . . . . . . 116 4 Centralized vs. Distributed: Wireless Scheduling with Delayed CSI119 4.1 Model and Problem Formulation . . . . . . . . . . . . . . . . . . . . . 121 4.1.1 Delayed Channel State Information . . . . . . . . . . . . . . . 122 4.1.2 Scheduling Disciplines . . . . . . . . . . . . . . . . . . . . . . 123 4.2 Centralized vs. Distributed Scheduling . . . . . . . . . . . . . . . . . 127 4.3 Tree Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 4.4 4.3.1 Distributed Scheduling on Tree Networks . . . . . . . . . . . . 137 4.3.2 On Distributed Optimality . . . . . . . . . . . . . . . . . . . . 139 4.3.3 Centralized Scheduling on Tree Topologies . . . . . . . . . . . 140 Clique Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 4.4.1 Centralized Scheduling . . . . . . . . . . . . . . . . . . . . . . 149 4.4.2 Distributed Scheduling . . . . . . . . . . . . . . . . . . . . . . 149 4.4.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 4.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 4.6 Partially Distributed Scheduling . . . . . . . . . . . . . . . . . . . . . 154 4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 5 Controller Placement for Maximum Throughput 5.1 159 Static Controller Placement . . . . . . . . . . . . . . . . . . . . . . . 160 5.1.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 5.1.2 Controller Placement Example . . . . . . . . . . . . . . . . . . 161 9 5.2 5.3 5.1.3 Optimal Controller Placement . . . . . . . . . . . . . . . . . . 163 5.1.4 Effect of Controller Placement . . . . . . . . . . . . . . . . . . 164 5.1.5 Controller Placement Heuristic . . . . . . . . . . . . . . . . . 165 5.1.6 Multiple Controllers . . . . . . . . . . . . . . . . . . . . . . . 167 Dynamic Controller Placement . . . . . . . . . . . . . . . . . . . . . . 170 5.2.1 Two-Node Example . . . . . . . . . . . . . . . . . . . . . . . . 174 5.2.2 Queue Length-based Dynamic Controller Placement . . . . . . 176 5.2.3 Controller Placement With Global Delayed CSI . . . . . . . . 185 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 5.3.1 Infrequent Controller Relocation . . . . . . . . . . . . . . . . . 191 5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 5.5 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 5.5.1 Proof of Theorem 21 . . . . . . . . . . . . . . . . . . . . . . . 195 5.5.2 Proof of Corollary 5 . . . . . . . . . . . . . . . . . . . . . . . 206 5.5.3 Proof of Lemma 16 . . . . . . . . . . . . . . . . . . . . . . . . 209 5.5.4 Proof of Theorem 23 . . . . . . . . . . . . . . . . . . . . . . . 212 6 Scheduling over Time Varying Channels with Hidden State Information 221 6.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 6.2 Throughput Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 6.3 Dynamic QLI-Based Scheduling Policy . . . . . . . . . . . . . . . . . 227 6.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 6.6 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 6.6.1 Proof of Theorem 26 . . . . . . . . . . . . . . . . . . . . . . . 231 7 Concluding Remarks 239 10 List of Figures 1-1 Example wireless network . . . . . . . . . . . . . . . . . . . . . . . . 27 1-2 Markov Chain describing the channel state evolution of each independent channel. State 0 corresponds to an OFF channel, while state 1 corresponds to an ON channel. . . . . . . . . . . . . . . . . . . . . . . 28 2-1 System model: transmitter and receiver connected through M independent channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2-2 Markov Chain describing the channel state evolution of each independent channel. State 0 corresponds to an OFF channel, while state 1 corresponds to an ON channel. . . . . . . . . . . . . . . . . . . . . . . 34 2-3 Optimal fixed probing interval for a two channel system as a function of state transition probability p = q. In this example, c = 0.5. . . . . 39 2-4 Throughput under the optimal fixed-interval probing policy for a twochannel system as a function of the state transition probability p = q . In this example, c = 0.5. . . . . . . . . . . . . . . . . . . . . . . . . 40 2-5 Two asymmetric Markov Chains, where 1−p1 −q1 ≥ 0, and 1−p2 −q2 ≥ 0. 41 2-6 Throughput of ’Probe Channel 1’ policy and ’Probe Channel 2’ policy. In this example, p1 is varied from 0 to 12 , and q1 is chosen so π = 34 . The second channel satisfies p2 = 1 4 11 and q2 = 1 , 12 resulting in π2 = π1 . 45 2-7 Comparison of the probe best policy, the probe second-best policy, and the probe third best policy as a function of the number of channels in the system. This simulation was run over 2 million probes, with each probe being at an interval of 4 time slots. . . . . . . . . . . . . . . . 49 2-8 Illustration of renewal process. Points represent probing instances, and labels represent probing results. Each renewal interval consists of phase 1, and phase 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2-9 Comparison of the probe best policy and the probe second-best policy for varying probing intervals k. In this example, p = q = 0.05. . . . . 56 2-10 Comparison of the probe best policy and the probe second-best policy for varying state transition probabilities p = q. In this example, k = 1. 57 2-11 Optimal decisions based on SAFs. White space corresponds to transient states under the optimal policy, and green circles, red boxes, and blue stars correspond to recurrent states where the optimal action is to not probe, probe channel 1, and probe channel 2 respectively. . . 69 2-12 Comparison of the expected throughput of the probe best policy and the round robin policy under fixed intervals and under dynamic intervals. The x-axis plots k, the length of the interval. The maxima of each graph represents the optimal policy in each regime. In this example, p = q = 0.05 and c = 0.5. . . . . . . . . . . . . . . . . . . . . . . . . . 76 2-13 Comparison of the probe best policy and round robin for varying values of k, the minimum interval between probes. In this example, p = q = 0.1, and c = 0.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3-1 System Model: A transmitter and receiver connected by M independent channels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3-2 Markov chain describing the channel state evolution of each independent channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 90 3-3 Information structure of an opportunistic communication system. The receiver measures the channel state X(t), encodes this into a sequence Z(t), and transmits this sequence to the transmitter. . . . . . . . . . 90 3-4 Causal information rate distortion function for different state transition probabilities p for a two channel opportunistic scheduling system. . . 97 3-5 Definition of K, the time since the last change in the sequence Z(t), with respect to the values of Z(t) up to time t. . . . . . . . . . . . . . 98 3-6 The causal information rate distortion function Rc (D) (Section 3.2) and the upper bound to the rate distortion function (Section 3.3), computed using Monte Carlo Simulation. Transition probabilities satisfy p = q = 0.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 3-7 Rate distortion functions for example systems. . . . . . . . . . . . . . 103 3-8 Causal information rate distortion lower bound, heuristic upper bound, and probing algorithmic upper bound for a two channel system of p = q = 0.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4-1 Feasible link activation under primary link interference. Bold edges represent activated links. . . . . . . . . . . . . . . . . . . . . . . . . 122 4-2 Markov Chain describing the channel state evolution of each independent channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 4-3 Delayed CSI structure for centralized scheduling. Controller (denoted by crown) has full CSI of red bold links, one-hop delayed CSI of green dashed links, and two-hop delayed CSI of blue dotted links. . . . . . . 123 4-4 Example network: All links are labeled by their channel state at the current time. Bold links represent activated links. . . . . . . . . . . . 126 4-5 Four-node ring topology. . . . . . . . . . . . . . . . . . . . . . . . . . 127 4-6 Expected sum-rate throughput for centralized and distributed scheduling algorithms over four-node ring topology, as a function of channel transition probability p. . . . . . . . . . . . . . . . . . . . . . . . . . 129 13 4-7 Example of combining matchings to generate components. Red links and blue links correspond to maximum cardinality matchings M0 and Mi . The component containing node i is referred to as path Pi . . . . 131 4-8 Abstract representation of a node n’s position on multiple conflicting paths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 4-9 Recursive distributed scheduling over binary trees. . . . . . . . . . . 137 4-10 Example Matchings. If link l is required to be in the matching, there exists a new maximal matching including l. . . . . . . . . . . . . . . 139 4-11 Threshold value of p∗ (k) such that for p > p∗ (k), distributed scheduling outperforms centralized scheduling on 2-level, k-ary tree. . . . . . . . 143 4-12 Possible scheduling scenarios for centralized scheduler. . . . . . . . . 145 4-13 Threshold value of p∗ (n) such that for p > p∗ (n), distributed scheduling outperforms centralized scheduling on n-level, binary tree. . . . . . . 147 4-14 A six-node sample network . . . . . . . . . . . . . . . . . . . . . . . . 152 4-15 Results for the six node network in Figure 4-14, over a horizon of 100,000 time slots. The plot shows the fraction of the perfect-CSI throughput obtained as a function of p, the transition probability of the channel state. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 4-16 A 5x5 grid network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 4-17 Results for a 5 x 5 grid network, over a horizon of 100,000 time slots. The plot shows the fraction of the perfect-CSI throughput obtained as a function of p, the transition probability of the channel state. . . . . 153 4-18 Results for 10-node clique topology, over a horizon of 100,000 time slots. The plot shows the fraction of the perfect-CSI throughput obtained as a function of p, the transition probability of the channel state. 154 4-19 Example subtrees from tree-partitioning algorithm . . . . . . . . . . . 155 4-20 Example partitioning of infinite tree (only first four level’s shown). Dashed links, dotted links, and solid links each belong to different subtrees. The solid nodes represent controllers, which are located at the root of each subtree. Nodes labeled with B are border nodes. . . 155 14 4-21 Illustration of border link labeling scheme . . . . . . . . . . . . . . . 156 4-22 Per-link throughput of the tree partitioning scheme, plotted as a function of transition probability p for various subtree depths. . . . . . . . 157 5-1 Markov Chain describing the channel state evolution of each independent channel. State 0 corresponds to an OFF channel, while state 1 corresponds to an ON channel. . . . . . . . . . . . . . . . . . . . . . . 160 5-2 Barbell Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 5-3 (Snowflake Network) Symmetric network in which node A has degree k1 and node B has degree k2 + 1 . . . . . . . . . . . . . . . . . . . . . 164 5-4 Sum-rate throughput resulting from having controller at three possible node locations, with k1 = 4 and k2 = 20, as a function of channel transition probability p = q. . . . . . . . . . . . . . . . . . . . . . . . 165 5-5 Evaluation of the controller placement heuristic for the barbell network and various channel transition probabilities p = q. . . . . . . . . . . . 167 5-6 14 Node NSFNET backbone network (1991) . . . . . . . . . . . . . . 168 5-7 Random geometric graph with multiple controllers placed using the myopic placement algorithm, followed by the controller exchange algorithm. Link colors correspond to distance from the nearest controller. 171 5-8 Wireless Downlink . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 5-9 Example 2-node system model. . . . . . . . . . . . . . . . . . . . . . 174 5-10 Throughput regions for different controller scenarios. Assume the channel state model satisfies p = 0.1, q = 0.1, and d1 (2) = d2 (1) = 1. . . . 176 5-11 Example star network topology where each node measures its own channel state instantaneously, and has d-step delayed CSI of each other node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 5-12 Simulation results for different controller placement policies, with channel model parameters p = 0.1, q = 0.1. . . . . . . . . . . . . . . . . . 189 5-13 Effect of QLI-delay on system stability, for p = q = 0.1. Each curve corresponds to a different value of τQ . 15 . . . . . . . . . . . . . . . . . 190 5-14 Two-level binary tree topology. . . . . . . . . . . . . . . . . . . . . . 190 5-15 Results for different controller placement policies on tree network in Figure 5-14: DPCS Policy with τQ = 150, equal time-sharing, and fixed controller at node 3. Simulation ran for 40,000 time slots with p = q = 0.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 5-16 Fraction of time each node is selected as the controller under DCPS for the topology in Figure 5-14. Blue bars correspond to system with p = q = 0.1, and red bars correspond to system with p = q = 0.3. . . 192 6-1 Wireless Downlink . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 6-2 Markov Chain describing the channel state evolution of each independent channel. State 0 corresponds to an OFF channel, while state 1 corresponds to an ON channel. . . . . . . . . . . . . . . . . . . . . . . 224 6-3 Symmetric arrival rate versus average queue backlog for a 4-queue system under different DLQ policies. Transition probabilities satisfy p = q = 0.01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 6-4 Symmetric arrival rate versus average queue backlog for a 4-queue system under different DLQ policies. Transition probabilities satisfy p = q = 0.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 16 List of Tables 2.1 Comparison of different probing policies for a two-channel system for a fixed probing interval (6) and time horizon 2,000,000. . . . . . . . . 2.2 Comparison of different probing policies for a fixed probing interval (6) and time horizon 2,000,000. State transition probability p = q = 0.05 2.3 49 Example renewal interval starting at time 0 and renewing at time 6k. At each probing interval, the second-best channel is probed. . . . . . 2.4 44 55 Throughput comparison for different probing policies with p = q = 0.05, k = 6. Simulation assumes 500 channels and a time horizon of 1,000,000 probes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 59 Results of controller placement problem over the NSFNET topology. Optimal placement is computed by solving (5.7) via brute force, while heuristic refers to (5.8). . . . . . . . . . . . . . . . . . . . . . . . . . . 168 5.2 Maximum weight for different controller placement algorithms over random geometric graphs. . . . . . . . . . . . . . . . . . . . . . . . . 172 6.1 Throughput optimal policies for different system models. Column corresponds to a different amount of information at the controller. Rows corresponds to the memory in the channel. S(t) is the channel state at the current slot, and Q(t) is the queue backlog. 17 . . . . . . . . . . 223 18 Chapter 1 Introduction Wireless networks have emerged as a fundamental communication architecture in today’s society. The growing popularity of mobile data networks has led to an increased demand for many wireless applications. Today, the use of wireless networks has expanded to include cellular networks, infrastructure-less peer-to-peer mobile ad hoc networks (MANETs), wireless sensor networks, city-wide mesh networks for broadband internet access, and more. Meanwhile, mobile data is expected to increase by an order of magnitude within the next five years [14]. As demand for wireless networks grows, advanced network control schemes must be developed to fully utilize the available capacity of these networks. Wireless networks introduce several challenges beyond those of their wired counterparts. First, wireless communication occurs over a shared medium, such that each transmission is heard by every receiver in the neighborhood of the transmitter. As a result, simultaneous transmissions interfere with one another, causing a significant degradation in throughput. Consequently, transmissions must be scheduled to mitigate this interference. Secondly, the wireless channel has a time-varying capacity. This phenomenon is referred to as fading, and arises on multiple time-scales due to the mobility of users, shadowing from large objects in the environment, and constructive and destructive interference caused by waveforms traversing multiple paths from source to destination [65]. Opportunistically transmitting over high-capacity channels, while avoiding low-quality channels, maximizes throughput over the net19 work. In order to address these challenges, research has focused on the development of scheduling algorithms that control transmissions to maximize throughput. The performance of wireless scheduling policies depends on the availability of network state information (NSI) at the controller. For example, opportunistically scheduling transmissions to exploit the time-varying nature of wireless channels necessitates current channel state information (CSI) [38]. Modern wireless technologies, such as LTE [59] and WiMax [4], allow for the measurement of channel qualities on the order of a few milliseconds, and this information can be fed back to the scheduler for use in decision-making. Additionally, as traffic demands fluctuate over time, knowledge of queue length information (QLI) is used to ensure the stable operation of the network [61]. While NSI can be used to improve system performance, significant bandwidth is required to supply NSI to the scheduler, reducing the available capacity for communication. In the current cellular 4G standard, the LTE uplink is designed to have a 30% overhead [59]. Furthermore, overheads in military networks have been shown to grow as large as 99% of all packet transmissions [3]. As networks continue to grow and additional wireless applications arise, current networks will not have sufficient capacity to constantly acquire full NSI. Cisco predicts that the number of mobile devices connected to the internet will increase by 50%, and the amount of mobile data will increase 11-fold by 2018 [14]. Therefore, it is increasingly important to study techniques for communication with reduced NSI overheads. In this thesis, we investigate the impact of NSI on the ability to effectively control the network. In particular, we study the tradeoff between the amount of CSI and QLI available to the network controller, and throughput attainable by scheduling algorithms utilizing this information. Moreover, we analyze wireless scheduling in scenarios where complete NSI is unavailable, either because it is only obtained from part of the network, it is obtained infrequently, or it is delayed as it is provided to the controller. We study the effect of this reduced information on the ability to control the network, and develop scheduling schemes based on imperfect NSI. 20 1.1 Related Work Several previous works have studied the relationship between the performance of network control tasks, and the required information at the controller. These works present different formulations for the wireless scheduling problem. This section elaborates on each of these different formulations. 1.1.1 Network Control In the past several decades, there has been much work studying the optimal control of wireless networks [5,10,29,33,41,46–48,50,61,62,69]. The area of throughput-optimal scheduling in wireless networks was pioneered by Tassiulas and Ephremides in [61,62], and later extended in [48]. These works show that the throughput-optimal schedule is given by the max-weight policy, where link-weights are computed as the product of the packet backlog and the current transmission rate over the link. This framework has been extended to other forms of network control, such as routing, congestion control, and quality of service (QoS) utility optimization. See [50] for an overview. In most of these works, current QLI and CSI is assumed to be globally available, and the performance of these algorithms depends on the accuracy of this information. While global NSI is essential for optimal centralized scheduling, acquiring networkwide CSI and QLI is impractical. A possible solution is to use distributed scheduling policies, which only require local NSI, but compute local rather than global optima, leading to a throughput reduction [46]. Greedy Maximal Scheduling (GMS) was proposed as a low complexity distributed policy, and has been shown to achieve a fraction of the throughput achievable by a centralized scheme, depending on the topology of the network [17, 39, 44, 73]. In this approach, the maximum weight transmissions are added to the schedule if they do not interfere with previously scheduled transmissions. Distributed scheduling schemes that approach the centralized throughput region are proposed in [47, 57], but require higher complexity than their greedy counterparts. Additionally, several authors have applied random-access approaches to maximize throughput in a fully distributed manner [35, 40, 53, 56]. However, NSI is required to 21 determine the correct transmission probabilities for these schemes. 1.1.2 Channel Probing One strategy to obtain local CSI is to explicitly probe channels to learn the current channel state. This is particularly relevant when there are multiple channels over which to communicate, and a transmitter seeks the channel yielding the highest throughput. Several works have studied channel probing in multichannel communication settings [2, 11, 12, 23, 27, 28, 31, 71]. Of particular interest is the work in [2] and [71], in which the authors assume that after a channel is probed, the transmitter must transmit over that channel. They show that the optimal probing policy is a myopic policy, which probes the channel with the highest expected transmission rate. This model is also considered in [31], which characterizes the achievable capacity region as the limit of a sequence of linear programs in terms of state action frequencies with increasingly large state spaces. The works in [11, 12, 23, 27, 28] consider a model where the channel state is independent over time; thus, probing a channel in the current slot yields no information about that channel in the future. Furthermore, these works allow for multiple channel probes per time slot. In [11, 12, 27, 28], probes occur sequentially, and the transmitter determines when to stop probing and either use one of the probed channels, or guess the state of an un-probed channel. In [23], all the channel probes occur simultaneously, and the objective is to determine the subset of channels to probe. 1.1.3 Protocol Information An independent branch of research has applied tools from information theory to characterize the NSI overheads required for various network control tasks. Among the earliest works to do so is Gallager’s seminal paper [19], where fundamental lower bounds on the amount of overhead, referred to as protocol information, needed to keep track of source and destination addresses, and message starting and stopping times, are derived using rate-distortion theory. Since Gallager’s paper, other researchers 22 have also considered information theoretic approaches to study protocol overheads in simple network settings. A discrete-time analog of Gallager’s model is considered in [18], where a rate distortion framework is used to characterize timing overhead in a slotted system. In [1], the authors use a rate distortion framework to calculate the minimum rate at which node location and neighborhood information must be transmitted in a mobile network, and suggest the corresponding impact on network capacity. Additionally, the work in [30] considers an information theoretic framework to study a simple scheduling problem in a wireless network. These works consider quantifying time-independent NSI. However, these approaches do not apply to scenarios in which the network state process has memory, such as opportunistic wireless scheduling. 1.1.4 Scheduling with Delayed CSI As discussed previously, acquiring up-to-date CSI and QLI from across the network may be unrealistic, especially for large networks. This motivates several works on throughput optimal scheduling under delayed NSI. In [41], the authors consider a time-slotted system, in which CSI and QLI updates are only reported once every T slots, but the transmitter makes a scheduling decision every slot, using delayed information. They show that delays in the CSI reduce the achievable throughput region, while delays in QLI do not adversely affect throughput. In [69], Ying and Shakkottai study throughput optimal scheduling and routing with delayed CSI and QLI. They show that the throughput optimal policy activates a max-weight schedule, where the weight on each link is given by the product of the delayed queue length and the conditional expected channel state given the delayed CSI. Additionally, they propose a threshold-based distributed policy which is shown to be throughput optimal (among a class of distributed policies). This work is extended in [70], where the authors account for the uncertainty in the state of the network topology as well. Lastly, the work in [54] characterizes the stability region of the network when an estimate of the channel state is available to the transmitter, rather than the true channel state. The throughput optimal policy in this case is a max-weight type 23 policy, where the weight is a conditional expected channel state given the estimate. 1.2 Our Contributions In this thesis, we study the tradeoff between the amount and the accuracy of the NSI available at the transmitters, and the resulting throughput performance of wireless opportunistic scheduling. The first half of the thesis considers a multi-channel wireless system, in which partial CSI is used to manage the control overheads of scheduling policies. We investigate optimal channel probing policies to obtain CSI, and provide a fundamental limit on the rate that CSI needs to be acquired to ensure a throughput guarantee. The second half of this thesis studies the delays inherent in acquiring CSI from across a network. We analyze the impact of these delays on system performance, and compare the optimal centralized approach using delayed CSI with a distributed approach using local CSI only. Lastly, we study the optimal location to place a centralized controller in the network, as a function of the CSI delays at each node. The remainder of this section elaborates on our contributions in these areas. 1.2.1 Channel Probing Chapter 2 studies channel probing as a means of acquiring CSI. Channel probing is widely used in modern wireless communication systems [59], in which a probing signal is used to learn the current channel states, and this CSI is used to schedule transmissions. However, using channel probing to maintain CSI pertaining to every channel is impractical, and not necessary for efficient communication. Therefore, the transmitter must decide which channels should be probed, and how often to probe these channels. In this thesis, we study the optimal channel probing and transmission policies for opportunistic communication. To begin, we fix the time interval between channel probes. For a system with two channels, we show that the choice of which channel to probe does not affect the performance of the scheduler, allowing for a closedform characterization of the expected throughput. When the two channels differ 24 statistically, we identify scenarios in which it is optimal to always probe one over the other. For a system with infinitely-many channels, we use renewal theory to characterize the expected throughput of several probing policies, and show that when the transmitter makes independent probing and transmission decisions, the myopic probing policy shown to be optimal in [2,71] is no longer optimal. We conjecture that the policy that probes the channel with the second-best expected channel state is the optimal policy in a general system, and prove its optimality in a system with three channels. We extend this model to allow for a dynamic optimization of probing intervals based on the results of the previous channel probes. We formulate this problem as a Markov decision process and introduce a state action frequency approach to solve it, which results in an arbitrarily good linear program approximation to the optimal probing intervals. For the case of an infinite channel system, we explicitly characterize the optimal probing interval for several probing policies. 1.2.2 Fundamental Limit on CSI Overhead One of the goals of this thesis is to characterize a fundamental bound on the rate that CSI needs to be conveyed to the transmitter to ensure a high throughput. Inspired from the work of Gallager in [19], in Chapter 3 we present a novel information theory-based formulation to quantify this limit. In particular, we consider a transmitter-receiver pair, connected through multiple time-varying channels. The receiver feeds CSI back to the transmitter, which schedules transmissions using this information to obtain a high throughput. The problem of minimizing the amount of information required by the transmitter such that it can effectively control the network is formulated as a rate distortion problem. In this work, we design a new distortion metric for opportunistic communication, capturing the impact of CSI availability on throughput. We incorporate a causality constraint to this rate distortion formulation to reflect the practical constraints of a real-time communication system. We compute a closed-form lower bound for the required rate at which CSI must be conveyed to the controller for a two-channel sys25 tem, where the channel is time-varying according to a Markov process. Additionally, we propose a practical encoding algorithm to achieve the required throughput with limited CSI overhead. This analysis leads to an interesting observation regarding the gap inherent in the causal rate distortion lower bound; we characterize this gap and discuss scenarios under which it vanishes. 1.2.3 Delayed Channel State Information The second half of this thesis studies the CSI required to make scheduling decisions in a wireless network. Due to the transmission and propagation delays over wireless links, it takes time for each node to acquire CSI pertaining to the other links in the network. As a consequence, a node has CSI that is delayed with respect to the current state of the channel. In Chapter 4, we propose a new model for CSI delays capturing the effect of distance on CSI accuracy, such that nodes have accurate CSI pertaining to adjacent links, and progressively delayed CSI pertaining to distant links. Thus, any centralized scheduling scheme is inherently restricted to using delayed CSI. An alternative approach is a distributed scheme using current local CSI rather than delayed global CSI; however, distributed approaches make locally optimal decisions which are often globally suboptimal [46]. We illustrate the impact of delayed CSI on the throughput performance of centralized scheduling, and prove that as these delays become significant, there exists a distributed policy that outperforms the optimal centralized policy. We develop sufficient conditions under which there exist such distributed policies and analytically characterize the throughput performance in tree and clique networks. In addition, we propose a hybrid approach combining centralized and distributed scheduling to trade-off between using delayed CSI and making suboptimal local decisions. Since the performance of centralized scheduling depends on the delay of the CSI, the location of the controller impacts the attainable throughput. Therefore, in Chapter 5 we formulate the problem of finding the optimal controller placement over a network. For any fixed controller placement, the links near the controller see a higher expected throughput than links far from the controller, due to the relationship be26 Figure 1-1: Example wireless network tween distance and CSI delay. Consequently, relocating the controller can balance the throughput in the network. We propose a dynamic controller placement framework, in which the controller is repositioned using globally available information, such as delayed QLI, as it is known that delays in QLI do not affect the throughput optimality of the max-weight policy [41]. We characterize the throughput region under all such policies using distance-based delayed NSI, and propose a throughput-optimal joint controller placement and scheduling policy. 1.2.4 Throughput Optimal Scheduling with Hidden CSI Lastly, we consider the scenario in which the controller has QLI, but no CSI available with which to schedule transmissions. As previous work suggests [48], scheduling requires QLI to balance the backlogs throughout the network. However, when the channel state process has memory, using current QLI is insufficient to optimally control the network. While it is known that delays in QLI do not negatively impact the throughput, we prove that delays in QLI are necessary for throughput optimality. We propose a scheduling policy using delayed QLI and prove its throughput optimality for a wireless downlink. This represents a paradox in which delayed NSI is more useful than current NSI. 27 p 1−p 0 1 1−q q Figure 1-2: Markov Chain describing the channel state evolution of each independent channel. State 0 corresponds to an OFF channel, while state 1 corresponds to an ON channel. 1.3 Modeling Assumptions In this work, we model a wireless network as a collection of nodes and links, representing wireless transceivers connected through time-varying channels. A scheduling decision corresponds to a link activation, or a subset of the links over which transmissions occur. To combat interference, we constrain a feasible link activation to be such that no two adjacent links are activated. This is referred to as a primary interference model or the node-exclusive interference model [29,47,61], and reflects the constraint that neighboring transmissions cannot occur simultaneously. The success of a wireless transmission depends on the strength of the signal received at the destination. Due to the fading characteristics of the channel, the received signal power fluctuates [65], affecting the ability to decode a transmission. A typical simplifying assumption is if the packet is received with a power (SNR) above a threshold, the packet is correctly decoded, and otherwise the packet is lost [72]. From this assumption, we adopt a two-state channel model, in which the channel state is either ON or OFF. When the channel is ON, a single packet can be transmitted, while any transmission over an OFF channel fails. We consider a time-slotted system in which the length of each time-slot is equal to the time required to transmit a packet. We assume the channel state remains constant throughout a time slot. This is a typical assumption representing a slow-fading environment, where the coherence time of the channel is larger than the duration of a time slot. We assume each channel has a state independent from every other channel in 28 the network, reflecting diversity in space [65]. However, channels evolve over time according to a Markov chain, as shown in Figure 1-2. This Markov channel state model was introduced by Gilbert in [22], and has been shown to accurately model the time-varying nature of a Rayleigh fading channel [43, 67, 72]. The ON/OFF Markov channel state model has been used in many previous works on throughputoptimal scheduling with partial channel state information [2,10,45,71]. The transition probabilities p and q are related to the coherence time of the channel. The main motivation behind the Markov model in Figure 1-2, aside from its simplicity, is that it captures the memory in the channel state process. We assume the channel model satisfies 1 − p − q ≥ 0, corresponding to channels with positive memory. In other words, a channel that is in the ON state is more likely to remain in the ON state than turn OFF, implying that knowledge of the channel state can be used to estimate the channel state in future time slots. This reduces the amount of overhead required, since the memory in the channel state process can be utilized for scheduling. 1.4 Thesis Outline The remainder of the thesis is organized as follows. Chapter 2 presents the channel probing framework, and proposes optimal strategies of probing channels to obtain CSI. Chapter 3 considers a fundamental lower bound on the rate of CSI acquisition required by the transmitter using causal rate distortion theory. Chapter 4 studies throughput optimal scheduling with delayed CSI, and shows that with enough delay, distributed scheduling outperforms centralized scheduling. Chapter 5 studies the problem of throughput-optimal controller placement, in terms of the delayed CSI and QLI available to each node in the network. Lastly, Chapter 6 studies a wireless downlink for which CSI is not available to the base station. 29 30 Chapter 2 Channel Probing in Opportunistic Communication Systems Consider a system in which a transmitter has access to multiple channels over which to communicate. The state of each channel evolves independently from all other channels, and the transmitter does not know the channel states a priori. The transmitter is allowed to probe a single channel after a predefined time interval to learn the current state of that channel. Using the information obtained from the channel probes and the memory in the channel state process, the transmitter selects a channel in each time-slot over which to transmit, with the goal of maximizing throughput, or the number of successful transmissions. This framework applies broadly to many opportunistic communication systems, in which there exists a tradeoff between overhead and performance. When there is a large number of channels over which to transmit, or a large number of users to transmit to, it may be impractical to learn the channel state information (CSI) of every channel before scheduling a transmission; consequently, the transmitter may be restricted to using partial channel state information, and use that partial CSI to make a decision. The transmitter must decide how much information to obtain, and which information is needed in order to make efficient scheduling decisions. In the context of channel probing, the decision of what information to obtain translates to the decision of which channel to probe. We refer to this decision as 31 the probing policy. Similarly, the decision of how much information to acquire translates to deciding how often to probe channels for CSI. This decision is referred to throughout this work as the probing interval. We consider both the scenario in which the probing interval is constant between channel probes, and the scenario where the probing interval is allowed to vary based on the channel probing history. In this work, we study channel probing for wireless opportunistic communication, in which the transmitter is able to transmit over a channel other than that which was probed1 . In a system with two channels, we show that the choice of which channel to probe does not affect the expected throughput. Additionally, we identify scenarios such that when the probability distribution of the channel state differs between the two channels, it is optimal to always probe one of the channels. For a system with an asymptotically large number of channels, we show that the myopic policy in [2, 71] is no longer optimal. Specifically, we use renewal theory to prove that a simple policy, namely the policy which probes the channel that is second-most likely to be ON, has a higher per-slot expected throughput. We characterize the per-slot throughput for these policies, and calculate the optimal fixed probing interval as a function of a probing cost. Furthermore, we prove the optimality of this policy for a system of three channels, and conjecture that this policy is in fact optimal for systems with any number of channels. In the second half of the work, we extend our model to allow for a dynamic optimization of the probing intervals based on the results of past channel probes. We formulate the problem as a Markov decision process, and introduce a state action frequency approach to solve for the optimal probing intervals. For the case of an infinite system of channels, we explicitly characterize the optimal probing interval for various probing policies. The remainder of this chapter is organized as follows. We describe the model and problem formulation in detail in Section 2.1. In Section 2.2, we analyze the channel probing problem for a system with two channels. In Section 2.3, we find the optimal probing policy for a system with three channels, and conjecture the optimal policy in a general system. We extend this to an infinite channel system in 1 Preliminary versions of this work appeared in [37] and [36]. 32 Section 2.4, and apply renewal theory to show that the myopic policy is suboptimal by analytically computing the expected per-slot throughput of another policy, which is proven outperform the myopic policy of [2]. In Section 2.5, we solve for the optimal probing intervals when a fixed cost is associated with probing. 2.1 System Model S1 S2 TX RX SM Figure 2-1: System model: transmitter and receiver connected through M independent channels Consider a transmitter and a receiver that communicate using one of M independent channels, as shown in Figure 2-1. Assume time is slotted and at every time slot, each channel is either in an OFF state or an ON state. Channels are i.i.d. with respect to each other, and evolve across time according to a discrete time Markov process described by Figure 2-2. At each time slot, the transmitter chooses a single channel over which to transmit. If that channel is ON, then the transmission is successful; otherwise, the transmission fails. We assume the transmitter does not receive feedback regarding previous transmissions2 . The objective is to maximize the expected sum-rate throughput, equal to the number of successful transmissions over time. The transmitter obtains channel state information (CSI) by explicitly probing channels at predetermined intervals. In particular, the transmitter probes the receiver every k slots for the state of one of the channels at the current time. Assume this information is delivered instantaneously, which is the same assumption made in many 2 If such feedback exists in the form of higher layer acknowledgements, it arrives after a significant delay and is not useful for learning the channel state. 33 p 1−p 0 1 1−q q Figure 2-2: Markov Chain describing the channel state evolution of each independent channel. State 0 corresponds to an OFF channel, while state 1 corresponds to an ON channel. previous works (e.g. [2, 23]). The transmitter uses the history of channel probes to make a scheduling decision. We emphasize that the transmitter may use a channel other than that which was probed for transmission. For example, if the transmitter probes a channel and it is found to be OFF, the transmitter can use a different channel which is more likely to be ON. 2.1.1 Notation Let Si (t) be the state of channel i at time t, where Si (t) = 1 corresponds to a channel that is ON at time t, and Si (t) = 0 corresponds to channel in the OFF state. The transmitter has an estimate of this state based on previous probes and the channel state distribution. Define the belief of a channel to be the probability that a channel is ON given the history of channel probes. For any channel i that was last probed k slots ago and was in state si , the belief xi is given by xi (t) = P Channel i is ON|probing history (2.1) = P Si (t) = 1|Si (t − k) = si ) where the second equality follows from the Markov property of the channel state process. The above probability is computed using the k-step transition probabilities 34 of the Markov chain in Figure 2-2: p − p(1 − p − q)k q + p(1 − p − q)k k , p01 = p+q p+q k q − q(1 − p − q) k p + q(1 − p − q)k = , p11 = . p+q p+q pk00 = pk10 (2.2) Throughout this work, we assume that 1 − p − q ≥ 0, corresponding to channels with “positive memory.” The positive memory property ensures that a channel that was ON k slots ago is more likely to be ON at the current time, than a channel that was OFF k slots ago. This allows the transmitter to make efficient scheduling decisions without explicitly obtaining CSI at each time slot. Mathematically, this property is described by the set of inequalities: pi01 ≤ pj01 ≤ pk11 ≤ pl11 ∀i ≤ j ∀l ≤ k. (2.3) As the CSI of a channel grows stale, the probability that the channel is in the ON state approaches π, the stationary distribution of the chain in Figure 2-2. lim pk01 = lim pk11 = π = k→∞ k→∞ p . p+q (2.4) Lastly, let τ k (·) be the function representing the change in belief of a channel over k time-slots when no new information regarding that channel is obtained. τ k (xi ) = xi pk11 + (1 − xi )pk01 (2.5) This function is used throughout this chapter to analyze the state transition properties of the system. 2.1.2 Optimal Scheduling Since the objective is to maximize the expected sum-rate throughput, the optimal transmission decision at each time slot is given by the maximum likelihood (ML) rule, which is to transmit over the channel that is most likely to be ON, i.e. the 35 channel with the highest belief. The expected throughput in a time slot is therefore given by max xi (t). (2.6) i where xi (t) is the belief of channel i at time t. Following the linearity of the state transition function τ k (xi ) in (2.5), and the positive memory assumption, the optimal scheduling decision remains the same between channel probes, as no additional CSI is obtained. 2.2 Two-Channel System To begin, we consider a two-channel system, and formulate the optimal probing strategy using dynamic programming (DP) over a finite horizon of length N . Each index n corresponds to a time slot at which a probing decision is made. Assume there are k time slots between channel probes; thus, index n corresponds to time slot t = kn. The system state at each probing index n is equal to the vector (x1 (n), x2 (n)), the belief of channel 1 and channel 2 as defined in (2.1). Let f k (x1 , x2 ) be the accumulated throughput over the k slots between channel probes, when channel 1 is probed. The function f k (x1 , x2 ) is computed by conditioning on the state of channel 1. If channel 1 is ON, which occurs with probability x1 , then the transmitter uses that channel for k P i slots, resulting in throughput k−1 i=0 p11 . If the probed channel is OFF, then the other P i channel is used for transmission over those k slots, yielding throughput k−1 i=0 τ (x2 ). Consequently, the expected accumulated throughput is given by k f (x1 , x2 ) = x1 k−1 X pi11 + (1 − x1 ) i=0 k−1 X τ i (x2 ) (2.7) i=0 Similarly, in terms of the above definition, f k (x2 , x1 ) is the accumulated throughput over the k slots between channel probes when channel 2 is probed. We proceed by developing the DP value function for each probing decision. Let Jni be the expected reward after the nth probe if the choice is made to probe channel i at the current probing instance, and then follow the optimal probing policy for all 36 subsequent probes. The expected reward after the last probe is given by: J N x1 , x2 = max JN1 x1 , x2 , JN2 x1 , x2 J 1N x1 , x2 = f k (x1 , x2 ) J 2N x1 , x2 = f k (x2 , x1 ) (2.8) (2.9) (2.10) Equations (2.9) and (2.10) follow since N is the final channel probe (in a time horizon of length N ), and thus the only reward is the immediate reward, which is given by (2.7). At probing time 0 ≤ n < N , the expected reward function is defined recursively. If the decision at probe n is to probe channel 1, then an expected throughput of f k (x1 , x2 ) is accumulated between probes n and n + 1, and at probe n + 1, the belief of channel 1 will be pk11 (pk01 ) if the probed channel was ON (OFF), and the belief of channel 2, which was not probed, will be τ k (x2 ). Thus, Jn (x1 , x2 ) is defined recursively as: Jn x1 , x2 = max Jn1 x1 , x2 , Jn2 x1 , x2 Jn1 x1 , x2 = f k (x1 , x2 ) + x1 Jn+1 (pk11 , τ k (x2 )) + (1 − x1 )Jn+1 (pk01 , τ k (x2 )) Jn2 x1 , x2 = f k (x2 , x1 ) + x2 Jn+1 (τ k (x1 ), pk11 ) + (1 − x2 )Jn+1 (τ k (x1 ), pk01 ) (2.11) (2.12) (2.13) The dynamic program in (2.8)-(2.13) can be solved to compute the optimal probing policy for the two channel system. To begin with, we prove the following property of the immediate reward after probing, f k (x1 , x2 ). Lemma 1. f k (x1 , x2 ) = f k (x2 , x1 ) The proof of Lemma 1 is given in the Appendix. Lemma 1 states that the immediate reward for probing channel 1 is the same as that for probing channel 2, for all probing intervals k. This is a consequence of the ability of the transmitter to choose over which channel to transmit after a channel probe, and accounts for the key difference between the model considered in this chapter, and models considered in previous works [2, 71]. Using this result, we present the main result of this section. 37 Theorem 1. For a two-user system with independent channels evolving over time according to an ON/OFF Markov chain with transition probabilities p and q, and probing epochs fixed at intervals of k slots, then for each channel probe, the total reward from probing channel 1 is equal to that of probing channel 2. Corollary 1. The channel probing policy which always probes channel 1 (2) is optimal in a two-channel system. The proof of Theorem 1 is given in the Appendix, and follows using induction based on Lemma 1, and the affinity of the expected reward function in (2.8)-(2.13). Corollary 1 follows directly from Theorem 1. Intuitively, when a channel is probed, the transmitter receives information about the optimal channel to use until the next probe. For example, if the probed channel is ON, it is optimal to transmit over that channel until the next probe occurs. On the other hand, if the probed channel is OFF, it is optimal to transmit over the un-probed channel, because the belief of that channel will always be higher than that of the OFF channel, based on the inequalities in (2.3). Thus, the only information required from the channel probe is which channel to transmit over until the subsequent channel probe, and this information can be obtained through probing either channel. This result is in contrast to the result in [71], which proves that the optimal decision is to probe the channel with the highest belief. However, their model assumes that a transmission must occur over the probed channel, whereas our model allows the transmitter to choose the channel over which to transmit independently based on the result of the probe. Consequently, the myopic policy of [71] is not a uniquely optimal policy in this setting. Theorem 1 is used to determine the optimal fixed probing interval. Clearly, probing more frequently yields higher throughput, but requires more resources as well. To capture this, we associate a fixed cost c with each probe. The goal is to determine the probing interval k that maximizes the difference between throughput earned and cost accumulated. Theorem 2. Assume a fixed-interval probing scheme with probing cost c. The optimal 38 Optimal Fixed Probing Intervals for Probing Cost = 0.5 55 Optimal Fixed Probing Interval 50 45 40 35 30 25 20 15 10 5 0 0.05 0.1 0.15 State Transition Probability (p) 0.2 0.25 Figure 2-3: Optimal fixed probing interval for a two channel system as a function of state transition probability p = q. In this example, c = 0.5. probing interval is given by k ∗ = arg max k πpk10 − c(p + q) . k(p + q) (2.14) Proof. From Corollary 1, the optimal probing policy is that which always probes channel 1. Under this policy, the belief of channel 2 equals the steady state probability of being in the ON state (π) given in (2.4). Channel 1 is probed every time, and will P k be ON a fraction π of the time. When channel 1 is ON, a throughput of k−1 i=0 p11 is obtained, and when it is OFF, the throughput is simply πk, the expected throughput yielded by channel 2 over an interval of duration k. Consequently, the expected per-slot throughput accounting for the cost of probing is given by 1 k −c+π k−1 X pi11 + (1 − π)πk i=0 = −c πpk10 +π+ . k k(p + q) (2.15) The proof follows by maximizing the above expression with respect to k. Figure 2-3 shows the optimal probing interval as a function of the state transition probability p. As p increases, each probe gives less information for the same cost. 39 Throughput of Optimal Fixed−interval Policy for Two−channel System, c = 0.5 0.75 0.7 Throughput 0.65 0.6 0.55 0.5 0.45 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 State Transition Probability (p) 0.4 0.45 0.5 Figure 2-4: Throughput under the optimal fixed-interval probing policy for a two-channel system as a function of the state transition probability p = q . In this example, c = 0.5. Thus, as the transition probability starts to increase, the optimal probing interval decreases, since information needs to be obtained more frequently to account for the reduced information in each probe. As p continues to grow, the reward from probing becomes smaller than the cost to probe, and it becomes optimal to not probe. Figure 2-4 shows the throughput under the optimal probing interval from Theorem 2 for various transition probabilities. As the state transition probability increases, throughput decreases. Note the optimal throughput does not drop below the steady state probability π, because at that point it is optimal not to probe, due to the high probing cost, and guess which channel to use. Theorems 1 and 2 combine to characterize the optimal fixed-interval probingpolicy for a two channel system. However, when the two channels are not identically distributed, the optimal probing decision depends on the channel statistics, as shown in Section 2.2.1. Furthermore, if the probing epochs are not fixed, the decision to probe depends on the results of the previous probe, yielding an advantage to probing one channel over the other, as shown in Section 2.5. 40 p1 1 − p1 0 p2 1 1 − q1 1 − p2 0 q1 1 1 − q2 q2 (a) Channel 1 (b) Channel 2 Figure 2-5: Two asymmetric Markov Chains, where 1 − p1 − q1 ≥ 0, and 1 − p2 − q2 ≥ 0. 2.2.1 Heterogeneous Channels In this section, we extend the results of the previous section to the case where the two channels differ statistically, i.e. channel 1 evolves in time according to the Markov chain in Figure 2-5a, and channel 2 evolves according to the chain in Figure 2-5b. Denote the k-step transition probability of channel 1 as aki,j and the k-step transition probability of channel 2 as bki,j . p1 + q1 (1 − p1 − q1 )k p1 + q1 p2 + q2 (1 − p2 − q2 )k = p2 + q2 p1 − p1 (1 − p1 − q1 )k p1 + q1 p2 − p2 (1 − p2 − q2 )k = p2 + q2 ak11 = ak01 = (2.16) bk11 bk01 (2.17) Additionally, let π1 and π2 be the steady state ON probability of channel 1 and channel 2 respectively. Intuitively, it is optimal to probe the channel with more memory, as that probe yields more information. For example, consider a channel that varies rapidly, with p1 = q1 = 1 2 − , and a channel which rarely changes state, with p2 = q2 = . Probing the low-memory channel provides accurate information for a few time slots, but that information quickly becomes stale, and the transmitter effectively guesses which channel is ON until the next probe. On the other hand, probing the highmemory channel yields information that remains accurate for many time slots after the probe. This intuition is confirmed in the following result. Theorem 3. For a two-user system with channel states evolving as in Figure 2-5, 41 and probing instances fixed to intervals of k slots, if p1 , p2 , q1 , q2 satisfy bi11 ≥ ai11 ∀i, (2.18) then, the optimal probing policy is to probe channel 2 at all probing instances. The proof of Theorem 3 is given in the Appendix, and follows by reverse induction over the channel probing instances. To highlight its significance, we present the following corollaries. Corollary 2. Assume the two channels satisfy π1 = π2 , and that p1 + q1 ≥ p2 + q2 . Then, the optimal policy is to always probe channel 2. Proof. We can rewrite the k-step transition probability of the second chain from (2.2) as follows. bi11 = p2 + q2 (1 − p2 − q2 )i p2 + q2 = π2 + (1 − π2 )(1 − p2 − q2 )i (2.19) = π1 + (1 − π1 )(1 − p2 − q2 )i (2.20) ≥ π1 + (1 − π1 )(1 − p1 − q1 )i (2.21) = ai11 (2.22) where (2.20) follows from the assumption that π1 = π2 , and (2.21) follows from the assumption that p1 + q1 ≥ p2 + q2 . Therefore, bi11 ≥ ai11 , and applying Theorem 3 concludes the proof. Corollary 3. Assume the two channels satisfy p1 + q1 = p2 + q2 , and that π1 ≤ π2 . Then, the optimal policy is to always probe channel 2. Proof. We can rewrite the k-step transition probability of the second chain from (2.2) as follows. bi10 = q2 (1 − (1 − p2 − q2 )i ) p2 + q2 42 = (1 − π2 )(1 − (1 − p2 − q2 )i ) (2.23) = (1 − π2 )(1 − (1 − p1 − q1 )i ) (2.24) ≤ (1 − π1 )(1 − (1 − p1 − q1 )i ) (2.25) = ai10 (2.26) where (2.24) follows from the assumption that p1 + q1 = p2 + q2 , and the inequality in follows from the assumption that π1 ≤ π2 . Since bi10 ≤ ai10 , then bi11 ≥ ai11 , and Theorem 3 can be applied to complete the proof. The above two corollaries describe scenarios where asymmetries in the channel statistics result in the optimal policy of always probing one of the two channels. This is in contrast to Theorem 1 where the channels are homogeneous, and probing either channel yields the same throughput. Corollary 2 states that if the channels are equally likely to be ON in steady state, the optimal decision is to probe the channel with the smaller pi + qi . In this context, pi + qi is the rate at which the channel approaches the steady state. In particular, the Markov channel state approaches its stationary distribution exponentially at a rate equal to the second eigenvalue of the transition probability matrix, which for a two-state chain is 1 − p − q [21]. The channel which approaches steady state more slowly is the channel with more memory, thus confirming our intuition that probing the channel with more memory is always optimal. Corollary 3 applies to a system in which the rate at which the steady state is reached is the same for both channels, but channel 2 is more likely to be ON in steady state than channel 1. In this case, it is optimal to probe the channel with the highest steady state probability of being ON at all probing instances. 2.2.2 Simulation Results We simulate the evolution of a two-channel system over time, and compare different fixed probing policies in terms of average throughput. We assume a time horizon of 2,000,000 probes, and assume a probe occurs every 6 slots. We consider five deterministic stationary channel probing policies: probe channel 1 always, probe 43 Simulation p1 = q1 = 0.1 p2 = q2 = 0.1 0.6536 0.6540 0.6538 0.6538 0.6532 Probe Channel 1 Probe Channel 2 Probe Best Channel Probe Worst Channel Round Robin p1 = 0.3, q1 = 0.1 p1 = q1 = 0.1 p2 = 0.15, q2 = 0.05 p2 = 0.15.q2 = 0.05 0.8240 0.7899 0.8652 0.8027 0.8450 0.8030 0.8402 0.7902 0.8452 0.7981 Table 2.1: Comparison of different probing policies for a two-channel system for a fixed probing interval (6) and time horizon 2,000,000. channel 2 always, probe the channel with the higher belief, probe the channel with the lower belief, and alternate between the channels (round robin). The average throughput under each of these policies is shown in Table 2.1. The first column of Table 2.1 shows that for a system with two i.i.d. channels with parameters p = q = 0.1, the choice of channel probing policy does not affect the average reward earned by the system, as predicted by Theorem 1. Additionally, we simulate a system with two statistically different channels. These results are shown in the second and third columns of Table 2.1. The first simulation (column 2) uses two channels with the same steady state probability (π = 0.75), but with channel 1 approaching steady state at a faster rate than channel 2. By Corollary 2, the optimal probing policy is to always probe channel 2, which is consistent with the simulation. The second simulation (column 3) uses two channels satisfying p1 + q1 = p2 + q2 = 0.2, and π2 > π1 , as in Corollary 3. As expected, probing channel two is optimal (after accounting for noise in the simulation measurements). In this case, probing the channel with the higher belief is a good policy, since the channel with the higher steady state probability has a higher belief more often. Figure 2-6 plots the throughput obtained by the policy which always probes channel 1 versus the policy that always probes channel 2 for a sample set of parameters. For the second channel, p2 = 1 4 and q2 = 1 , 12 so that π2 = 34 . For channel 1, π1 is fixed at 43 , but p1 is varied from 0 to 12 . When channel 1 has less memory than channel 2, probing channel 1 yields much higher throughput than the alternative. In this example, when p1 is very small and channel 1 has a high degree of memory, probing channel 1 results in a 15% throughput improvement over probing channel 2. 44 0.96 Probe Channel 1 Probe Channel 2 0.94 0.92 Throughput 0.9 0.88 0.86 0.84 0.82 0.8 0.78 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Channel Transition Probability p1 Figure 2-6: Throughput of ’Probe Channel 1’ policy and ’Probe Channel 2’ policy. In this example, p1 is varied from 0 to 21 , and q1 is chosen so π = 43 . The second channel satisfies 1 p2 = 14 and q2 = 12 , resulting in π2 = π1 Theorem 1 and Theorem 3 describe scenarios in which probing one of the two channels at all probing instances is optimal. The simplicity of the optimal probing policy in these cases is an artifact of the transmitter only having two-channels from which to choose. As the number of channels increases, a policy always probing one of the channels is suboptimal. Therefore, additional analysis is required for a system with more than two channels. 2.3 Optimal Channel Probing over Finitely Many Channels As mentioned above, for systems with more channels, i.e. M > 2, the policy of always probing one of the channels is suboptimal. In particular, the optimal probing policy is a function of the beliefs of the channels. In this section, we show that the policy which probes the channel with the second highest belief is optimal for a system of three channels, and conjecture an extension to a general system of finitely many channels. 45 2.3.1 Three Channel System To begin, consider a system of three channels, with channel states identically distributed according to the Markov chain in Figure 2-2. The following result characterizes the optimal channel probing policy as a function of the beliefs of the three channels. Theorem 4. In a system of three channels, where a single channel is probed every k slots, the optimal probing policy is to probe the channel with the second-highest belief. Denote by xi the belief of the channel with the ith largest belief. Thus, x1 ≥ x2 ≥ x3 . The probe second-best policy probes the channel with belief x2 . If that channel is ON, the transmitter uses that channel to transmit over for the next k slots. After these k slots, the best channel is the channel that was last probed, with belief τ k (1), where τ k is the information-decay function defined in (2.5). If on the other hand, the probed channel is OFF, the transmitter will use the channel with the highest belief among the remaining channels, x1 . After k slots, that channel will have belief τ k (x1 ), and the belief of the probed channel will be the smallest, at τ k (0). Define a function Wn as follows: Wn (x1 , x2 , x3 ) , f k (x1 , x2 ) + x2 Wn+1 τ k (1), τ k (x1 ), τ k (x3 ) + (1 − x2 )Wn+1 k k (2.27) k τ (x1 ), τ (x3 ), τ (0) for all 0 ≤ n ≤ N , where f k (·) is the immediate reward function defined in (2.7). Let WN +1 (x1 , x2 , x3 ) = 0 by convention. Note that Wn (x1 , x2 , x3 ) is the expected throughput of the probe second-best policy from time n onwards if and only if x1 ≥ x2 ≥ x3 . Additionally, if x2 ≥ x1 ≥ x3 , then Wn (x1 , x2 , x3 ) is the expected reward of the policy which probes the channel with the highest belief at index n, and then probes the channel with the second highest belief at all subsequent times. The following results hold for this definition of Wn , and are used to prove Theorem 4. 46 Lemma 2. If x1 ≥ x2 ≥ x3 , then for all 0 ≤ n ≤ N , Wn (x1 , x2 , x3 ) ≥ Wn (x2 , x1 , x3 ) (2.28) Lemma 3. If x1 ≥ x2 ≥ x3 , then for all 0 ≤ n ≤ N , Wn (x1 , x2 , x3 ) ≥ Wn (x1 , x3 , x2 ) (2.29) The proofs of Lemmas 2 and 3 are given in the Appendix. Proof of Theorem 4. Without loss of generality, assume the beliefs of the three channels x1 , x2 , x3 satisfy x1 ≥ x2 ≥ x3 . The proof follows using reverse induction on the probing index n. For n = N , probing the best channel yields throughput WN (x2 , x1 , x3 ), while probing the second and third best channels yields throughput WN (x1 , x2 , x3 ) and WN (x1 , x3 , x2 ) respectively. By Lemma 2, WN (x1 , x2 , x3 ) ≥ WN (x2 , x1 , x3 ), and by Lemma 3, WN (x1 , x2 , x3 ) ≥ WN (x1 , x3 , x2 ); therefore, probing the second-best channel is optimal at n = N . Now assume it is optimal to probe the second-best channel at probes n + 1, . . . , N . At probing instance n, the throughput of the three potential choices of channels are given by Wn (x2 , x1 , x3 ), Wn (x1 , x2 , x3 ), and Wn (x1 , x3 , x2 ) for probing the best, second-best, and third best channels respectively. By Lemma 2, Wn (x1 , x2 , x3 ) ≥ Wn (x2 , x1 , x3 ), and by Lemma 3, Wn (x1 , x2 , x3 ) ≥ Wn (x1 , x3 , x2 ); therefore, probing the second-best channel is optimal at n as well. By induction, probing the second-best channel is optimal at all probing times. This result is exciting as it contradicts the previous result in [2] which stated that the policy which probes the best channel is optimal for the model in which the transmitter must use the channel that was probed for transmission. In our model, the transmitter can collect CSI separately from the transmission decision, and therefore probing the second-best channel yields a higher throughput. Further intuition as to why the probe second-best policy is optimal is presented in Section 2.4.2. 47 2.3.2 Arbitrary Number of Channels Theorem 4 shows that the probe second-best policy is optimal for a system of three channels. In general, for M > 3, we conjecture that the probe second-best policy remains optimal. Conjecture 1. The probe second-best policy is optimal among all channel probing policies for fixed probing intervals k. The proof used for the M = 3 channel case does not extend to M ≥ 4. In [2], the authors used a coupling argument to circumvent this issue and prove the optimality of the myopic policy for their setting for general networks. However, due to the additional complexity of the probe second-best policy, this coupling argument does not hold in our setting. Instead, we believe the general case can be proven by bounding the maximum difference in expected reward from being in a better state after probing the k th best channel for k ≥ 2, and proving that this extra reward must be less than the gain in the immediate expected reward that probing the second-best channel offers. 2.3.3 Simulation Results For a system with more than two channels, we can compare various probing policies to show support for Conjecture 1, shown in Table 2.2. In addition to the policies considered in the previous section, we include the optimal probe second-best policy, and a policy probing the third-best channel for comparison. In all scenarios, the probe second-best policy outperforms the other probing policies, thus supporting our conjecture. However, the advantage of using the probe second-best policy over similar policies, such as probe best and probe third best, is relatively small. In Figure 2-7, we compare the performance of the probe-best policy, the probe second-best policy, and probe third-best policy as a function of the number of channels in the system, for a fixed probing interval. We see that as the number of channels grows, the gap in performance between the probe second-best policy and the probe second-best policy increases. Furthermore, the probe third best policy becomes more 48 Simulation Probe Channel 1 Probe Best Channel Probe Second-Best Probe Third Best Probe Worst Round Robin 3 Channels 0.6955 0.7455 0.7553 0.6849 0.6860 0.7460 5 Channels 0.6959 0.7640 0.7787 0.7617 0.6804 0.7649 7 Channels 0.6957 0.7650 0.7799 0.7691 0.6810 0.7658 10 Channels 0.6958 0.7659 0.7808 0.7706 0.6806 0.7661 Table 2.2: Comparison of different probing policies for a fixed probing interval (6) and time horizon 2,000,000. State transition probability p = q = 0.05 Policy Comparison over 2M probes, k = 4 1 0.95 Probe Best Probe 2nd Best Probe 3rd Best Throughput 0.9 0.85 0.8 0.75 0.7 2 4 6 8 10 Number of Channels 12 14 16 Figure 2-7: Comparison of the probe best policy, the probe second-best policy, and the probe third best policy as a function of the number of channels in the system. This simulation was run over 2 million probes, with each probe being at an interval of 4 time slots. efficient as the number of channels increase, but does not reach the level of throughput of the probe second-best policy. 2.4 Infinite-Channel System As the number of channels increases, the state space grows large and the probing formulation becomes more difficult to analyze. However, as the number of channels grows to infinity, the state space of the system can be simplified. For an infinite channel system, whenever a probed channel is OFF, it is effectively removed from the 49 system. This is because there always exists a channel which has not been probed in the previous N slots, for any finite N , and thus its belief is equal to the steady state ON probability π. Therefore, since an OFF channel has belief pk01 ≤ π for any finite k, it will never be optimal to transmit over that channel. In this section, we use the infinite channel assumption to characterize the average throughput under several probing policies. We consider the myopic policy which is shown to be optimal for the model in [2, 71], as well as a round robin policy which probes channels sequentially. In addition, we characterize the throughput of the probe second-best policy, which is conjectured to be the optimal probing policy for a finite number of channels in Section 2.3, and prove that it outperforms the other two policies in this setting. 2.4.1 Probe-Best Policy To begin, consider the probe-best policy, which probes the channel with the highest belief. This policy is commonly referred to as a myopic or greedy policy, as it maximizes the immediate reward without regard to future rewards. Intuitively, such a policy is advantageous as the channel with the highest belief is the most likely to be ON at the current time, yielding the highest expected throughput. Recall that this policy is shown to be optimal for the model in [2, 71]. For our model, we have the following results. Theorem 5. The state of the system is given by an infinite vector of beliefs for each channel. Without loss of generality, assume this vector is sorted as x = {x1 , x2 , . . .} such that x1 ≥ x2 ≥ x3 . . .. The class of recurrent states under the probe-best policy satisfy x1 ≥ π, and xi = π for all other channels i 6= 1. Proof. The probe best policy probes the channel with belief x1 . If this channel is ON, its belief becomes p111 in the next slot, and it remains the channel with the highest belief by the equality in (2.3). If that channel is OFF, it is removed from the system as per the infinite channel assumption. Therefore, the vector consisting of xi = π for all i is reachable from any state. This state corresponds to the transmitter having 50 no information about the network. The only other state reachable from this state is reached when an ON channel is found, at which point, the state returns to a state satisfying x1 ≥ π, and xi = π ∀i 6= 1. Theorem 6. Assume the transmitter makes probing decisions every k slots according to the probe best policy. The expected per-slot throughput is given by E[Thpt] = π + πpk10 k(p + q)(pk10 t + π) (2.30) Proof. We use renewal theory to compute the average throughput. Under the probe best policy, Theorem 5 states that only one channel can have belief greater than π. Define a renewal to occur immediately prior to probing a channel with belief π. Therefore, if a channel is probed and if it is OFF, it is removed from the system and a renewal occurs k slots later (before the next probe). If the channel is ON, that channel is probed at all future probing instances until it is found to be OFF. The expected inter-renewal time X B is given by X B = (1 − π)k + π(kE(N ) + k) = k + kπE(N ) (2.31) (2.32) where N is a random variable denoting the number of times an ON channel is probed before it is OFF, and is geometrically distributed with parameter pk10 . Equation (2.32) reduces to XB = k + πk . pk10 (2.33) The expected reward RB incurred over a renewal interval is πk for the interval imP i mediately after the OFF probe, and k−1 i=0 p11 for each subsequent ON probe. If the first probe is ON, then there will be N probes until the final OFF probe. Thus, the expected accumulated reward over a renewal interval is expressed as RB = (1 − π)πk + π(πk + E[N ] k X i=1 51 pi11 ) (2.34) k−1 X π = πk + πE N pi11 = πk + i=0 Pk−1 i=0 pk10 pi11 . (2.35) Using results from renewal-reward theory [20], the average per-slot reward is given by the ratio of the expected reward over the renewal interval divided by the expected length of that interval. P i πkpk10 + π k−1 πpk10 RB i=0 p11 = π + = kpk10 + πk k(p + q)(pk10 + π) XB (2.36) Observe that the per-slot throughput is always larger than π, and decreases toward π as k increases. The probe best policy maximizes the immediate reward; however, the drawback of this policy is that when the probed channel is OFF, the transmitter has no knowledge of the state of the other channels as it searches for an ON channel, as described by Theorem 5. Consequently, transmitter probes channels with belief π until an ON channel is found, resulting in a low expected reward. 2.4.2 Probe Second-Best Policy Now, consider a simple alternative policy, the probe second-best policy, which at each time slot probes the channel with the second-highest belief, and transmits on the channel with the highest belief after the channel probe. Consider channel state beliefs x1 , x2 , x3 , . . . where x1 ≥ x2 . . . ≥ xi . . . ≥ π. The probe-best policy of the Section 2.4.1 probes the channel with belief x1 . If it is ON, the transmitter uses that channel (resulting in throughput equal to 1 for the next slot) and if it is OFF, the transmitter uses the channel with the next highest belief x2 . Thus, the expected immediate reward of probing the best channel is given by x1 + (1 − x1 )x2 = x1 + x2 − x1 x2 , 52 (2.37) The probe second-best policy instead probes the channel with belief equal to x2 . If this channel is ON, it transmits over that channel (resulting in throughput equal to 1) and otherwise transmits over the channel with highest belief, x1 . The expected immediate reward of probing the second-best channel is equal to x2 + (1 − x2 )x1 = x1 + x2 − x1 x2 . (2.38) Hence, the probe second-best policy has the same immediate reward as the probe best policy. To understand how the probe second-best policy outperforms the probebest policy, consider the following result, analogous to Theorem 5 for the probe best policy. Theorem 7. The state of the system is given by an infinite vector of beliefs for each channel. Without loss of generality, assume this vector is sorted as x = {x1 , x2 , . . .} such that x1 ≥ x2 ≥ x3 . . .. The class of recurrent states under the probe second-best policy satisfy x1 ≥ x2 ≥ π, and xi = π for all other channels i 6= 1, 2. Proof. The probe second-best policy probes the channel with belief x2 . If this channel is ON, its belief becomes pk11 at the next probe, and it becomes the channel with the highest belief, while x1 becomes the second highest belief. If the channel is OFF instead, it is removed from the system as per the infinite channel assumption. Therefore, the vector consisting of x1 ≥ π and xi = π for all i > 1 is reachable from any state. This state corresponds to the transmitter having information of only one channel. From this state, by probing an ON channel, the system transitions into a state with two channels having belief greater than π; however, the system can never have more than two channels with xi > π. By Theorem 7, since two channels can have belief greater than π under the probe second-best policy, when the probe second-best policy probes an OFF channel, the transmitter uses the channel with the next highest belief, while probing new channels to find another ON channel. This approach results in a higher expected throughput over that interval than under the probe best policy, which transmits on a channel 53 with belief equal to the steady state probability π. It is this intuition that leads us to consider the probe second-best policy. The following theorem confirms our intuition, by showing that the probe second-best policy yields a higher throughput than the probe best policy. Phase 1 1 1 0 0 0 0 Phase 2 1 1 0 Phase 1 0 0 Phase 2 1 1 1 1 0 0 Renewal Interval i + 1 Renewal Interval i Figure 2-8: Illustration of renewal process. Points represent probing instances, and labels represent probing results. Each renewal interval consists of phase 1, and phase 2. Theorem 8. The average reward of the probe second-best policy is greater than that of the probe best policy, for all fixed probing intervals k. Proof. Theorem 8 is proved using renewal theory to compute the average throughput of the probe second-best policy, and comparing it to that of the probe best policy. The key to the proof is in the definition of the renewal interval. We define a renewal to occur when the best channel has belief p2k 11 , and the second-best channel (and every other channel) has belief π. A renewal interval is divided into two phases: Phase 1 includes all the channel probes until a probe results in an ON channel, and phase 2 includes the subsequent probes until an OFF channel is probed. The division of renewal intervals into phases is illustrated in Figure 2-8. In Phase 1, the transmitter probes channels with belief π until an ON channel is probed, and in phase 2, the transmitter probes the second-best channel with belief greater than π until an OFF channel is probed. This definition ensures that the inter-renewal periods are i.i.d. The state evolution during an sample renewal interval is shown in Table 2.3. The expected inter-renewal time is given by kE(N1 + N2 ), where N1 is the number of probes required to find an ON channel in phase 1, and is geometrically distributed with parameter π, and N2 is the number of probes required until the next OFF probe in phase 2. The distribution of N2 is dependent on N1 , and has the following 54 Time Best Channel Belief Second-Best Belief 0 p2k 11 π k p3k 11 π 2k p4k 11 π 3k pk11 p5k 11 4k pk11 p2k 11 5k pk11 p2k 11 6k p2k 11 π Probe Result 0 0 1 1 1 0 - Table 2.3: Example renewal interval starting at time 0 and renewing at time 6k. At each probing interval, the second-best channel is probed. distribution function. N2 = 1 +2)k 1 w.p. p(N 10 (2.39) i w.p. p(N1 +2)k p2k (p2k )i−2 11 10 11 i≥2 Therefore, (2+N1 )k X SB 1 E[p11 = kE(N1 + N2 ) = k +1+ π p2k 10 ] (2.40) During phase 1 of a renewal, the expected reward accumulated is given by 1 RSB (N1X −1)k−1 k−1 X i+2k i =E p11 + p11 . i=0 (2.41) i=0 The first term is the throughput obtained from transmitting over the best channel while looking for an ON channel, which starts with belief p2k 11 and decays until an ON channel is found, as shown in Table 2.3. In phase 2, the expected reward is given by 2 RSB k−1 k−1 X X i k+i = E (N2 − 1) p11 + p11 . i=0 (2.42) i=0 For N2 − 1 intervals of length k, the transmitter will transmit over a channel that was P ON, yielding throughput ki=0 pi11 . Then, for the last interval prior to the renewal, the best channel has belief pk11 , and the expected accumulated throughput over that P interval is ki=0 pk+i 11 . The average reward per time slot is given by 1 2 RSB + RSB πpk10 (π + p2k 10 ) =π+ 2k 2 (p + q)k[π + p10 (1 − (1 − p − q)k + π)] X̄SB 55 (2.43) Figure 2-9: Comparison of the probe best policy and the probe second-best policy for varying probing intervals k. In this example, p = q = 0.05. We can compute the difference between (2.43) and (2.30) from Theorem 6 as 2 1 + R̄SB R̄SB R̄B ((1 − p − q)k πpk10 )2 − = k k(p + q)(π + pk10 )(π 2 + p2k X̄SB X̄B 10 (π + 1 − (1 − p − q) )) (2.44) Due to the positive memory assumption, we have 0 ≤ (1 − p − q)k ≤ 1 for all k. Therefore, the expression in (2.44) is positive, completing the proof. Theorem 8 asserts that probing the channel with the second highest belief is a better policy than probing the channel with the highest belief under fixed-interval probing policies. A numerical comparison between these two policies is shown in Figure 2-9. This result is in sharp contrast to the result in [2] that shows that probing the channel with the highest belief is optimal. In our model, when a probed channel is OFF, the transmitter uses its knowledge of the system to transmit over another channel believed to be ON. In the model of [2], when an OFF channel is probed, the transmitter cannot schedule a packet in that slot. This difference in the reward after probing leads to significantly different probing policies. This result also supports Conjecture 1, claiming that the probe second-best policy is optimal among all policies. 56 Figure 2-10: Comparison of the probe best policy and the probe second-best policy for varying state transition probabilities p = q. In this example, k = 1. 2.4.3 Round Robin Policy It is of additional interest to consider a min-max policy, the round robin policy, which probes the channel for which the transmitter has the least knowledge. In a system with finitely many channels, the round robin policy probes all of the channels sequentially, always probing the channel which was probed longest ago. When the number of channels grows to infinity, the transmitter always probes a channel that has previously never been probed. Consider channel state beliefs x1 , x2 , x3 , . . . where x1 ≥ x2 . . . ≥ xi . . . ≥ π. Under the round robin policy, a channel with belief π is probed; if that channel is ON it will be used by the transmitter (earning throughput 1) and otherwise the channel with the highest belief will be used (earning throughput x1 , the belief of the best channel). Thus, the immediate reward of round robin is given by: π + (1 − π)x1 = π + x1 − πx1 . (2.45) By comparing (2.45) to (2.37), it is clear the immediate reward of the round robin policy is less than that of the probe best and the probe second-best policy. Interestingly, the following Theorem shows that the average per-slot throughput is the same 57 for the round robin policy as the myopic probe best policy. Theorem 9. For all fixed k, the round robin policy has a per-slot average throughput of E[Thpt] = π + πpk10 , k(p + q)(pk10 + π) (2.46) the same as the probe best policy. Proof. Let a renewal occur every time a new channel is probed and found to be ON. Since the result of each probe is an i.i.d. random variable with parameter π, the inter-renewal intervals are i.i.d. The inter-renewal time XRR = k · N , where k is the time between probes, and N is a geometric random variable with parameter π, as defined in (2.4). Over that interval, the transmitter transmits over the last channel known to be ON, until a new ON channel is found. The expected reward earned over each renewal period is given by secondRRR NX ∗k−1 i =E p11 (2.47) i=0 k pN 10 = E πN k + p+q pk10 =k+ . p + q − q(1 − p − q)k (2.48) (2.49) Thus, the time-average reward is given by πpk10 secondRRR =π+ , secondXRR k(p + q)(π + pk10 ) (2.50) which is the same as the reward of the probe best policy in Theorem 6. Recall from Theorem 5, that under the probe best policy, at most one channel can have belief greater than π. In contrast, under the round robin policy many channels can have belief greater than π. Thus, Theorem 9 is surprising, since the round robin policy trades off immediate reward for increasing knowledge of the channel states, but yields the same average throughput as the probe best policy. 58 Policy Probe Best Probe Second-Best Round Robin Theory Simulation 0.7659 0.7657 0.7806 0.7806 0.7659 0.7662 Table 2.4: Throughput comparison for different probing policies with p = q = 0.05, k = 6. Simulation assumes 500 channels and a time horizon of 1,000,000 probes. 2.4.4 Simulation Results In order to simulate a infinite-channel system, we consider a system of 500 channels and apply different probing policies at a fixed probing interval of 6 slots. We compute the average throughput obtained over the total horizon, as shown in Table 2.4. In this simulation, the probe second-best policy is optimal over all policies considered, while the probe best policy and round robin policies have the same average throughput. Additionally, we can see that the analytical throughput derived in Section 2.4 is very close to that observed through simulation. 2.5 Dynamic Optimization of Probing Intervals Until this point, we have assumed the transmitter chooses channels to probe at predetermined probing intervals. However, an alternate approach is to optimize the time until the next channel probe dynamically, as a function of the collected CSI. For example, after an ON probe, the transmitter has knowledge of a channel yielding high throughput, and therefore may not need to probe a new channel immediately. On the other hand, if that probed channel is OFF, the transmitter may benefit from probing a new channel in the near future to make up for lost throughput. In this example, the optimal probing policy sets the probing interval dynamically, based on the results of the previous probe. In this section, the optimal dynamic probing policy is modeled as a stochastic control problem, where at each time slot, a decision is made whether to probe a channel or not, and if so, which channel to probe. 59 2.5.1 Two-Channel System To begin with, consider a system with only two channels. The optimal channel probing problem is formulated as a Markov Decision Process (MDP) or a DP over a finite horizon of length T . At each time slot, the system state is the vector consisting of the belief of each channel’s state. After observing the system state at time t, the transmitter selects an action from a set of possible actions: probe channel 1, probe channel 2, or probe neither channel. Thus, the expected reward function at time slot t is given by Jt (x1 , x2 ) = max Jt0 (x1 , x2 ), Jt1 (x1 , x2 ), Jt2 (x1 , x2 ) , (2.51) where Jt0 is the expected reward given that neither channel is probed at the current slot, and Jt1 and Jt2 are the expected reward functions given that channel 1 or channel 2 is probed respectively. When the transmitter chooses to not probe either channel, the throughput obtained is given by the maximum of the channel beliefs, since the transmitter uses the better of the two channels. Assume channel probes incur a cost of c. This channel cost represents the resources required to execute a channel probe, thus taking away from resources which could have been used for additional throughput. When a channel is probed and is ON, the transmitter uses that channel and a reward (throughput) of 1 is earned. On the other hand, if the probed channel is OFF, a unit throughput is earned only if the second channel is ON. Therefore, the terminal cost at time t = T is given by JT0 (x1 , x2 ) = max(x1 , x2 ), (2.52) JT1 (x1 , x2 ) = −c + x1 + (1 − x1 )x2 , (2.53) JT2 (x1 , x2 ) = −c + x2 + (1 − x2 )x1 (2.54) For t < T , the reward function includes the expected future reward, based on the result of the channel probe. If the transmitter does not probe a channel, the state at the next slot is given by (τ (x1 ), τ (x2 )), where τ (·) = τ 1 (·) is the information decay 60 function in (2.5). If a channel is probed, then the belief of that channel in the following slot is either p or 1 − q depending on whether the probe results in an OFF channel or an ON channel respectively. Thus, the recursive expected reward DP equations are given by Jt0 (x1 , x2 ) = max(x1 , x2 ) + Jt+1 τ (x1 ), τ (x2 ) (2.55) Jt1 (x1 , x2 ) = −c + x1 + x2 − x1 x2 + x1 Jt+1 1 − q, τ (x2 ) + (1 − x1 )Jt+1 p, τ (x2 ) (2.56) Jt2 (x1 , x2 ) = −c + x1 + x2 − x1 x2 + x2 Jt+1 τ (x1 ), 1 − q + (1 − x2 )Jt+1 τ (x1 ), p (2.57) The maximizer of (2.51) is the optimal probing policy at time slot t as a function of the current state. Note that the state space is countably infinite, as each belief xi has a one-to-one mapping to an (S, k) pair, where S is the state at the last channel probe, and k is the time since the last probe. Several observations can be made about the value function described in (2.51)(2.57), as stated through the following lemmas. Lemma 4 (Linearity). Jt1 (x1 , x2 ) is linear in x1 for fixed x2 , and similarly, Jt2 (x1 , x2 ) is linear in x2 for fixed x1 . Proof. We will prove the first half of this lemma here, and the other half follows by symmetry. Let 0 ≤ λ ≤ 1. Jt1 (λx1 + (1 − λ)y1 , x2 ) = −c + λx1 + (1 − λ)y1 + x2 − (λx1 + (1 − λ)y1 )x2 + (λx1 + (1 − λ)y1 )Jt+1 1 − q, τ (x2 ) + (1 − (λx1 + (1 − λ)y1 ))Jt+1 p, τ (x2 ) (2.58) = λ(−c + x1 − x1 x2 ) + (1 − λ)(−c + y1 − y1 x2 ) + λ x1 Jt+1 1 − q, τ (x2 ) + (1 − x1 )Jt+1 p, τ (x2 ) 61 + (1 − λ) y1 Jt+1 1 − q, τ (x2 ) + (1 − y1 )Jt+1 p, τ (x2 ) (2.59) = λJt1 (x1 , x2 ) + (1 − λ)Jt1 (y1 , x2 ) (2.60) Lemma 5 (Commutativity). Jt (x1 , x2 ) = Jt (x2 , x1 ) (2.61) Proof. This proof is by reverse induction on t. For T , we have max(x1 , x2 ), −c + x1 + x2 − x1 x2 , −c + x2 + x1 − x2 x1 JT (x1 , x2 ) = max = max (2.62) max(x2 , x1 ), −c + x2 + x1 − x2 x1 , −c + x1 + x2 − x1 x2 = JT (x2 , x1 ) (2.63) (2.64) Now assume (2.61) holds for time t + 1. Then we have Jt1 (x1 , x2 ) = −c + x1 + x2 − x1 x2 + x1 Jt+1 1 − q, τ (x2 ) + (1 − x1 )Jt+1 p, τ (x2 ) = −c + x2 + x1 − x2 x1 + x1 Jt+1 τ (x2 ), 1 − q + (1 − x1 )Jt+1 (2.65) τ (x2 ), p (2.66) = Jt2 (x2 , x1 ) (2.67) Additionally, we have Jt0 (x1 , x2 ) = max(x1 , x2 ) + Jt+1 τ (x1 ), τ (x2 ) = max(x2 , x1 ) + Jt+1 τ (x2 ), τ (x1 ) = Jt0 (x2 , x1 ) 62 (2.68) (2.69) (2.70) Finally, we can use these two results to show that Jt (x1 , x2 ) = max Jt0 (x1 , x2 ), Jt1 (x1 , x2 ), Jt2 (x1 , x2 ) = max Jt0 (x2 , x1 ), Jt2 (x2 , x1 ), Jt1 (x2 , x1 ) (2.71) (2.72) = Jt (x2 , x1 ) (2.73) The proof follows by induction. Let Φt (0), Φt (1), Φt (2) be the sets of (x1 , x2 ) such that it is optimal to not probe, probe channel 1, and probe channel 2 respectively at time t. Lemma 6 (Probe Symmetry). If (x1 , x2 ) ∈ Φt (1), then (x2 , x1 ) ∈ Φt (2). Proof. If (x1 , x2 ) ∈ Φt (1), then Jt1 (x1 , x2 ) ≥ Jt2 (x1 , x2 ) and Jt1 (x1 , x2 ) ≥ Jt0 (x1 , x2 ). Using Lemma 5, we can then say that Jt2 (x2 , x1 ) ≥ Jt1 (x2 , x1 ) and Jt2 (x2 , x1 ) ≥ Jt0 (x2 , x1 ) which implies (x2 , x1 ) ∈ Φt (2). Lemma 7 (No-Probe Symmetry). If (x1 , x2 ) ∈ Φt (0), then (x2 , x1 ) ∈ Φt (0). Proof. If (x1 , x2 ) ∈ Φt (0), then Jt0 (x1 , x2 ) ≥ Jt1 (x1 , x2 ) and Jt0 (x1 , x2 ) ≥ Jt2 (x1 , x2 ). It follows from Lemma 5 that Jt0 (x1 , x2 ) = Jt0 (x2 , x1 ) and Jt1 (x1 , x2 ) = Jt1 (x2 , x1 ) which implies Jt0 (x2 , x1 ) ≥ Jt1 (x2 , x1 ). By a similar argument, we can show Jt0 (x2 , x1 ) ≥ Jt2 (x2 , x1 ), and therefore (x2 , x1 ) ∈ Φt (0). These last two lemmas show that the optimal decision regions are symmetric about the line x1 = x2 . Lemmas (4)-(7) combine to prove a convexity result on the expected reward function. Theorem 10 (Convexity). For all t, Jt (x1 , x2 ) is convex in x1 for fixed x2 , and is convex in x2 for fixed x1 . Proof. This is proved by reverse induction over t. For t = T , JT (x1 , x2 ) = max max(x1 , x2 ), −c + x1 + x2 − x1 x2 , −c + x2 + x1 − x2 x1 63 (2.74) is convex in each element since each argument to the maximum is convex (or affine) and the maximum of convex functions is also convex. Now consider t < T , and we assume that Jt+1 (x1 , x2 ) is convex in x1 for fixed x2 . To begin with, we note that τ (λx1 + (1 − λ)y1 ) = (1 − q)(λx1 + (1 − λ)y1 ) + p(1 − λx1 − (1 − λ)y1 ) (2.75) = (1 − q)λx1 + pλ(1 − x1 ) + (1 − q)(1 − λ)y1 + p(1 − λ)(1 − y1 ) (2.76) = λτ (x1 ) + (1 − λ)τ (y1 ) (2.77) First we consider the expected throughput after not probing. Jt0 (λx1 + (1 − λ)y1 , x2 ) = max(λx1 + (1 − λ)y1 , x2 ) + Jt+1 τ (λx1 + (1 − λ)y1 ), τ (x2 ) (2.78) ≤ λ(max(x1 , x2 )) + (1 − λ)(max(y1 , x2 )) + Jt+1 λτ (x1 ) + (1 − λ)τ (y1 ), τ (x2 ) (2.79) ≤ λ(max(x1 , x2 )) + (1 − λ)(max(y1 , x2 )) + λJt+1 τ (x1 ), τ (x2 ) + (1 − λ)Jt+1 τ (y1 ), τ (x2 ) (2.80) = λ(Jt0 (x1 , x2 )) + (1 − λ)(Jt0 (y1 , x2 )) (2.81) where (2.79) holds by the convexity of max(x, ·) and the linearity of τ (·), and (2.80) holds from the induction hypothesis. Additionally, Jt1 (x1 , x2 ) is convex in x1 by lemma 4. For Jt2 (x1 , x2 ), we have: Jt2 (λx1 + (1 − λ)y1 , x2 ) = −c + λx1 + (1 − λ)y1 + x2 − (λx1 + (1 − λ)y1 )x2 + x2 Jt+1 τ (λx1 + (1 − λ)y1 ), 1 − q + (1 − x2 )Jt+1 τ (λx1 + (1 − λ)y1 ), p (2.82) = λ(−c + x1 + x2 − x1 x2 ) + (1 − λ)(−c + y1 + x2 − y1 x2 ) 64 + x2 Jt+1 λτ (x1 ) + (1 − λ)τ (y1 ), 1 − q + (1 − x2 )Jt+1 λτ (x1 ) + (1 − λ)τ (y1 ), p (2.83) ≤ λ(−c + x1 + x2 − x1 x2 ) + (1 − λ)(−c + y1 + x2 − y1 x2 ) + λ x2 Jt+1 τ (x1 ), 1 − q + (1 − x2 )Jt+1 τ (x1 ), p + (1 − λ) x2 Jt+1 τ (y1 ), 1 − q + (1 − x2 )Jt+1 τ (y1 ), p (2.84) = λ(Jt2 (x1 , x2 )) + (1 − λ)(Jt2 (y1 , x2 )) (2.85) Thus, each of Jt0 (x1 , x2 ), Jt1 (x1 , x2 ), and Jt2 (x1 , x2 ) is convex in x1 for fixed x2 , and therefore Jt (x1 , x2 ) is convex in x1 as well. The second half of the proof statement holds by symmetry. Using the convexity of the expected reward function, we can find sufficient conditions for probing optimality for a given state. Theorem 11. If for any time slot t, the system state (x1 (t), x2 (t)) satisfies c ≤ min(x1 (t), x2 (t)) 1 − max(x1 (t), x2 (t)) (2.86) Then it is optimal to probe at slot t. Proof. Jt0 (x1 , x2 ) = max(x1 , x2 ) + Jt+1 τ (x1 ), τ (x2 ) (2.87) ≤ max(x1 , x2 ) + x1 Jt+1 1 − q, τ (x2 ) + (1 − x1 )Jt+1 p, τ (x2 ) (2.88) = max(x1 , x2 ) + Jt1 (x1 , x2 ) + c − x1 − x2 + x1 x2 (2.89) Where (2.88) follows from Theorem 10. Therefore, Jt0 (x1 , x2 ) − Jt1 (x1 , x2 ) ≤ 0 if c − x1 − x2 + x1 x2 + max(x1 , x2 ) ≤ 0 c ≤ min(x1 , x2 ) 1 − max(x1 , x2 ) 65 (2.90) (2.91) While the convexity bound yields sufficient conditions for probing optimality, necessary conditions do not follow directly from this analysis. Additionally, the convexity bound used in (2.88) is loose, and thus probing is often optimal even in states which do not satisfy the conditions of Theorem 11. 2.5.2 State Action Frequency Formulation The channel probing MDP can also be modeled as an infinite horizon, average cost problem. In this case, it can be formulated as a linear program (LP) in terms of state action frequencies, which can be solved to determine the optimal policy. A state action frequency vector ω(s; a) exists for each state and potential action, and corresponds to a stationary randomized policy such that ω(s; a) equals the steady state probability that at a given time slot, the state is s and the action taken is a. Let s = (s1 , k1 , s2 , k2 ), where s1 and s2 are the last known states of the two channels respectively, and k1 and k2 are the respective times since the last probe on each channel. We use this notation rather than the belief notation to emphasize the countable nature of the state space. Furthermore, the action a satisfies a ∈ {0, 1, 2}, representing the actions of not probing, probing channel 1, and probing channel 2 respectively. As mentioned above, the state space is countably infinite and therefore the resulting state action frequency LP is intractable; however, we can approximate the optimal solution by truncating the state space at a large finite value. In particular, assume that ki takes values between 0 and Kmax , where Kmax is a predefined constant. When ki = Kmax , and channel i is not probed, then let ki = Kmax at the next slot as max well. Clearly, as Kmax increases, pK approaches π, and the truncated formulation 11 approaches the countable state space formulation. Since the belief of each channel approaches steady state exponentially fast, this truncation method can be used to find a near-optimal solution to the stochastic control problem. See [31] for details. The state action frequency formulation is presented in (2.92)-(2.103). Equation 66 Max. X X ω(s1 , k1 , s2 , k2 ; a)r(s1 , k1 , s2 , k2 ; a) (2.92) ω(s1 , k1 , s2 , k2 ; a) = 1 (2.93) a s1 ,s2 ,k1 ,k2 s.t. X X a s1 ,s2 ,k1 ,k2 X ω(s1 , 1, s2 , k2 ; a) = K max X X ω(s01 , k1 , s2 , k2 − 1; 1)pks01,s1 1 k1 =1 s01 a ∀s1 , s2 , 2 ≤ k2 ≤ Kmax − 1 X ω(s1 , k1 , s2 , 1; a) = K max X X (2.94) ω(s1 , k1 − 1, s02 , k2 ; 2)pks02,s2 2 k2 =1 s02 a ∀s1 , s2 , 2 ≤ k1 ≤ Kmax − 1 X (2.95) ω(s1 , 1, s2 , Kmax ; a) a = K max X X k1 =1 s01 pks01,s1 ω(s01 , k1 , s2 , Kmax − 1; 1) + ω(s01 , k1 , s2 , Kmax ; 1) 1 ∀s1 , s2 (2.96) X ω(s1 , Kmax , s2 , 1; a) a = K max X X k2 =1 s02 pks02,s2 ω(s1 , Kmax − 1, s02 , k2 ; 2) + ω(s1 , Kmax , s02 , k2 ; 2) 2 ∀s1 , s2 (2.97) X ω(s1 , k1 , s2 , k2 ; a) = ω(s1 , k1 − 1, s2 , k2 − 1; 0) a ∀s1 , s2 , 2 ≤ k1 , k2 ≤ Kmax X (2.98) ω(s1 , Kmax , s2 , k2 ; a) = ω(s1 , Kmax − 1, s2 , k2 − 1; 0) a + ω(s1 , Kmax , s2 , k2 − 1; 0) ∀s1 , s2 , 2 ≤ k2 ≤ Kmax X ω(s1 , k1 , s2 , Kmax ; a) = ω(s1 , k1 − 1, s2 , Kmax − 1; 0) (2.99) a + ω(s1 , k1 − 1, s2 , Kmax ; 0) ∀s1 , s2 , 2 ≤ k1 ≤ Kmax X ω(s1 , Kmax , s2 , Kmax ; a) = ω(s1 , Kmax , s2 , Kmax ; 0) (2.100) a + ω(s1 , Kmax − 1, s2 , Kmax − 1; 0) + ω(s1 , Kmax − 1, s2 , Kmax ; 0) + ω(s1 , Kmax , s2 , Kmax − 1; 0) ∀s1 , s2 r(s1 , k1 , s2 , k2 ; a) = −c + pks11,1 + pks22,1 − pks11,1 pks22,1 ∀a ∈ {1, 2} r(s1 , k1 , s2 , k2 ; 0) = max pks11,1 , pks22,1 67 (2.101) (2.102) (2.103) (2.92) is the objective, maximizing the average reward, where the reward functions are defined for each possible action in (2.102) and (2.103). Equation (2.93) is a normalization constraint, ensuring that the state action frequencies sum to one. Equations (2.98) through (2.101) are balance equations for the case when the chosen action is to not probe. Note that we include constraints to deal with the truncation of the state space. Equations (2.94) and (2.96) deal with the evolution of the state when channel 1 is probed, and equations (2.95) and (2.97) deal with the case when channel 2 is probed. For weakly communicating finite state and action MDP’s, there exists a solution to the state action frequency LP that corresponds to a deterministic stationary policy [31]. Specifically, for all recurrent states s in the solution, the state action frequencies ω(s; a) > 0 for some a, and since the optimal policy is deterministic, ω(s; a) > 0 is satisfied for only one value of a, which is the optimal decision at that state, and ω(s; a) = 0 for all other actions. Since transient states are only visited finitely often, they have zero state action frequencies for every action. Note, solving the SAF LP with a simplex-based solver, e.g. CPLEX, returns the deterministic solution. The solution to the state action frequency LP for sample parameters is shown in Figure 2-11. This plot shows the optimal decision as a function of the belief of channel 1 (x1 ) and the belief of channel 2 (x2 ). The system state can only reach a countable subset of the points on the x1 -x2 plane. Under any policy, except for the policy where a channel is never probed, there is a single recurrent class of states, and only states in this class will have non-zero state action frequencies. From any recurrent state, if the optimal decision is not to probe, the system state will move to the next state (τ (x1 ), τ (x2 )). The coordinates (τ k (x1 ), τ k (x2 )) represent a line between (x1 , x2 ) and (π, π) parameterized by k. Thus, while the transmitter refrains from probing, the system state follows a trajectory between the current state to (π, π). Based on this observation, and the results in Figure 2-11, we can characterize the structure of the optimal probing algorithm. For a given set of parameters, there exists a probing-region, e.g. the dotted convex region in Figure 2-11, and a point (π, π), denoted by the dot in the center of Figure 68 p=0.03, q=0.03, C=0.5 1 0.9 0.8 Belief of Channel 2 0.7 0.6 (π,π) 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 Belief of Channel 1 0.7 0.8 0.9 1 Figure 2-11: Optimal decisions based on SAFs. White space corresponds to transient states under the optimal policy, and green circles, red boxes, and blue stars correspond to recurrent states where the optimal action is to not probe, probe channel 1, and probe channel 2 respectively. 69 2-11. At each time slot, if the current state lies outside of the probing region, the optimal decision is to not probe, and the state moves along the linear trajectory toward (π, π). When the state lies on or inside the probing region, the controller probes one of the channels. The state reached after the channel probe corresponds to a point on the edge of the unit square in Figure 2-11, since the belief of a probed channel is either 0 or 1. Then the process repeats, and the state will follow a new trajectory toward the point (π, π). Therefore, the region for which probing is optimal translates to a threshold policy, where probing becomes optimal after a certain time, given by the distance between the point on the edge of the unit square, and the probing region. If the point (π, π) lies outside of the probing region, then there exists a trajectory to (π, π) that does not intersect the probing region. If this is the case, the state after a channel probe will eventually be a point on the unit square such that the line between that point and (π, π) does not cross the probing optimality region, and the optimal decision is to never probe and the state monotonically approaches (π, π) along the linear trajectory. In this situation, all states are transient under the optimal policy. In summary, the optimal time between probes is given by the distance between the state immediately following a probe and the state on the boundary of the probing region, lying on the line between the current state and (π, π). To find the probing region, and the decisions to make at each point on the probing regions, the SAF LP in (2.92)-(2.103) must be solved. 2.5.3 Infinite-Channel System For a system with more than two channels, the previous approaches can be used to formulate the problem of finding the optimal probing intervals. The drawback of these approaches is that the state space grows exponentially with the number of channels, and it becomes impractical to solve the MDP approach in Section 2.5.1 and the state action frequency LP in section 2.5.2. However, in the asymptotic limit of the number of channels, the infinite channel assumption in Section 2.4 can be applied to greatly simplify the state space, and new approaches can be developed to characterize the 70 optimal probing intervals. The optimal intervals are related to the underlying probing policy used to select the channels to probe. In this section, we consider two of the channel probing policies from Section 2.4: the probe best policy and the round robing policy, and characterize the optimal intervals at which to probe. To begin, assume the decision of which channel to probe is given by the probe-best policy. The optimal probing interval is characterized by the following theorem. Theorem 12. For a system in which the transmitter only probes the channel with the highest belief, the optimal probing decision is to probe immediately after probing an OFF channel, and to probe k ∗ slots after probing an ON channel, where k ∗ is given by 1 πpk10 k k = arg max − c(π + p10 ) kπ + pk10 (p + q) k ∗ (2.104) Proof. As a result of Theorem 5, under the probe best policy, the belief of the best channel x1 at every slot satisfies x1 ≥ π, and the belief of every other channel equals π. When a probed channel is OFF, it is removed from the system, and the belief of every channel is π, representing a state in which the transmitter has no knowledge of the system. The system remains in this state until an ON channel is found, as each OFF channel which is probed is removed from the system. If the optimal decision in this state is to not probe, then the transmitter never probes, since the state never changes. Thus, if it is optimal to probe in the state where the transmitter has no knowledge, then it is optimal to probe immediately after an OFF channel is probed. When a probed channel is ON, the highest belief is always 1 − q in the next slot, and decays until that channel is probed again, as it will always remain the channel with the highest belief. Hence, there exists a threshold k ∗ after an ON probe such that after that time, it becomes optimal to probe. Assume a probe occurs in the slot immediately after probing an OFF channel, and let k denote the number of slots after probing an ON channel until the best channel is probed again. Define a renewal to occur when the transmitter probes an OFF channel. It follows that the inter-renewal time is one slot if the next probed channel is OFF, and 1 + kN if the probed channel is ON, where N is a random variable 71 equal to the number of times the ON channel is probed until it turns OFF. Thus, the expected inter-renewal time is given by X̄B = (1 − π) + π(1 + kE[N ]) = 1 + πkE[N ], (2.105) (2.106) The random variable N is is geometrically distributed with parameter pk10 . The reward accumulated over this interval is π if the probed channel is OFF, and N times Pk−1 i i=0 p11 if the channel is ON, plus an additional π after the final OFF probe. A cost of c is incurred for each channel probe within this interval. The expected reward is given by k−1 X i R̄B = (1 − π)(π − c) + π E[N ]( p11 − c) + π − c (2.107) i=0 k−1 X = (π − c) + πE[N ]( pi11 − c). (2.108) i=0 Therefore, the average per-time-slot reward is given by the ratio of expected reward over a renewal interval to the expected length of the renewal interval: P i pk10 (π − c) + π( k−1 R̄B i=0 p11 − c) = pk10 + kπ X̄B π + pk10 πpk10 =π−c + kπ + pk10 (p + q)(kπ + pk10 ) (2.109) (2.110) The maximizing value of k in equation (2.110) is the optimal time k ∗ to wait after an ON probe. Theorem 12 characterizes the optimal probing interval under the probe best policy. If the probing policy changes, the optimal interval changes as well. Nevertheless, the following result shows that under the round-robin policy, the optimal probing interval has a similar structure. Theorem 13. For a system in which the transmitter probes channels according to the round robin policy, the optimal decision is to probe a new channel immediately after 72 probing an OFF channel, and to probe k 0 slots after probing an ON channel, where k 0 is given by P −2 i −c(p + q) + pEN [ k+N p11 ] i=0 k = arg max p(k − 1) + p + q k 0 (2.111) where N is a geometrically distributed random variable with parameter π. Proof. In contrast to Theorem 12, there is no analog to Theorem 5 for round robin probing. Thus, we first prove the optimal form of the policy is a threshold policy, by proving the monotonicity of the expected reward function. Given the structure of the optimal policy, renewal theory is applied to characterize the optimal interval. To begin, we can write the expected reward for probing k slots after the previous probe as a function of k over a finite horizon. k k JT (k) = max p11 , −c + π + (1 − π)p11 (2.112) k k Jt (k) = max p11 + Jt+1 (k + 1), −c + π 1 + Jt+1 (1) + (1 − π) p11 + Jt+1 (k + 1) (2.113) where the left argument to the max(·, ·) function represents the expected reward from not probing, and the right argument represents the expected reward from probing an unknown channel. Under round robin probing, Jt is monotonically decreasing in k for all t. To see this, assume t = T , then assume k satisfies πpk10 ≥ c, then k k JT (k) = max p11 , −c + π + (1 − π)p11 = pk11 + max(0, −c + πpk10 ) = pk11 − c + π(1 − pk11 ) = pk11 (1 − π) + π − c (2.114) (2.115) which is monotonically decreasing in k, since pk11 is a monotonically decreasing function of k. If on the other hand πpk10 ≤ c, then JT (k) = pk11 which is monotonically decreasing in k. Now assume t ≤ T , and the hypothesis holds for t + 1, . . . , T , we will show using 73 induction that it holds for t. Let g(k) = pk11 + Jt+1 (k + 1). By induction, g(k) is monotonically decreasing in k, and using the analysis from the base case, the expression Jn (k) = max g(k), −c + π 1 + Jn+1 (1) + (1 − π)g(k) (2.116) is also monotonically decreasing in k. The remaining proof of Theorem 13 follows by reverse induction over the time horizon. Assume there is a k 0 such that it is optimal to probe at time T . Consider 0 k ≥ k 0 . It is optimal to probe if c ≤ πpk10 . However, c ≤ πpk10 since it is optimal to 0 probe at k 0 , and pk10 ≥ pk10 . Therefore, it is also optimal to probe at k. Now consider t ≤ T , and assume our induction hypothesis holds for t + 1. The difference in the arguments to max(·, ·) in (2.113) can be bounded as follows −c + π(1 + Jt+1 (1)) + (1 − π)(pk11 + Jt+1 (k + 1)) − pk11 − Jt+1 (k + 1) = −c + π(1 + Jt+1 (1)) + −π(pk11 + Jt+1 (k + 1)) ∗ (2.117) (2.118) ≥ −c + π(1 + Jt+1 (1)) + −π(pk11 + Jt+1 (k ∗ + 1)) (2.119) ≥ 0. (2.120) where the first inequality holds from the monotonic property of the J function, and the second inequality holds from the assumption that it is optimal to probe for k 0 . Therefore, it is optimal to probe at t, and by induction, it is optimal to probe k 0 slots after an ON probe for some value of k 0 . To characterize the optimal value of k 0 , we introduce renewal theory using the renewals defined in Section 2.4.3. Recall, a renewal occurs upon probing a channel which is ON. The expected time until the next renewal is the k 0 slots until the next probe, plus the number of slots it takes to find a new ON channel. Let N be the number of probes until an ON channel is found, which is geometrically distributed 74 with parameter π. The expected inter-renewal time is given by X̄R = EN [k + N − 1]. (2.121) Over this interval, a cost of c is incurred for each of the N channel probes, and at each time slot the transmitter uses the last known ON channel for transmission. Thus, the expected reward is given by R̄R = EN 1 − N c + k+N X−2 pi11 . (2.122) i=0 To determine the optimal k 0 , we maximize the ratio of the expected reward to the expected length of the renewal interval, thus concluding the proof. Note that the optimal time to wait to probe after an ON probe under round robin (k 0 ) in (2.111) differs from the optimal k ∗ under the probe best policy in (2.104). Figure 2-13 plots the average reward of round robin and probe best for different values of k. Recall that under fixed probing intervals, Theorem 9 states that both policies have the same average reward. However, under dynamic probing intervals, the probe best policy outperforms the round robin policy. Figure 2-12 shows a comparison between expected throughput of the optimal fixed-interval probing policy and the optimal dynamic-interval policy under probe best and round robin. By looking at the maxima in these graphs, we observe that for the chosen parameters, introducing a dynamic probing-interval optimization yields an 8% gain in throughput under probe best, and a 5% gain in throughput under round robin. Based on the results of the fixed probing interval model, a natural extension to the above analysis is to consider the probe second-best policy, which was conjectured to be the optimal probing policy under fixed channel probing intervals. In contrast to probe best and round robin, the optimal time until the next probe under the probe second-best policy depends on the belief of the best channel after an ON channel is probed, and consequently, probe second-best does not have a single solution for the optimal probing interval after an ON channel has been probed. Thus, characterizing 75 (a) Probe Best Probing Policy (b) Round Robin Probing Policy Figure 2-12: Comparison of the expected throughput of the probe best policy and the round robin policy under fixed intervals and under dynamic intervals. The x-axis plots k, the length of the interval. The maxima of each graph represents the optimal policy in each regime. In this example, p = q = 0.05 and c = 0.5. 76 Figure 2-13: Comparison of the probe best policy and round robin for varying values of k, the minimum interval between probes. In this example, p = q = 0.1, and c = 0.5. the optimal probing intervals is a more challenging problem in this context. It is an interesting and open problem to determine if the probe second-best policy is still optimal under dynamic probing intervals. 2.6 Conclusion This chapter focuses on channel probing as a means of acquiring network state information, and optimizes the acquisition of this information in terms of which channels to probe and how often to probe these channels. In contrast to the work in [2, 71] that established the optimality of the myopic probe best policy, we showed that for a slightly modified model, these results no longer hold. Under a two channel system, we proved that probing either channel results in the same throughput, and under an infinite channel system, we proved that a simple alternative, the probe secondbest policy, outperforms the probe best policy in terms of average throughput. We proved the optimality of the probe second-best policy in three channel systems, and conjecture that probing the second-best channel is the optimal decision in a general multi-channel system. Proving this conjecture is interesting, and remains an open 77 problem. Additionally, we showed that dynamically optimizing the probing intervals based on the results of the channel probe can additionally increase system throughput. We characterized the optimal probing intervals in a two channel system by formulating a Markov decision problem, and using a state action frequency approach to solve the dynamic program. For the infinite channel case, we characterized the optimal probing intervals subject to a fixed probing policy, namely the probe best policy and the round robin probing policy. An extension to general probing polices, as well as a joint optimization over the probing decisions and the probing intervals is an interesting extension to this work. 2.7 2.7.1 Appendix Proof of Lemma 1 Lemma 1: f k (x1 , x2 ) = f k (x2 , x1 ) Proof of Lemma 1. k f (x2 , x1 ) = x2 = = k−1 X pi11 + (1 − x2 ) i=0 k−1 X i=0 k−1 X k−1 X τ i (x1 ) i=0 x2 pi11 + (1 − x2 )τ i (x1 ) (2.123) x2 pi11 + (1 − x2 )(xi pi11 + (1 − x1 )pi01 ) (2.124) x1 pi11 + (1 − x1 )(xi pi11 + (1 − x2 )pi01 ) (2.125) x1 pi11 + (1 − x1 )τ i (x2 ) = f k (x1 , x2 ) (2.126) i=0 = = k−1 X i=0 k−1 X i=0 Equation (2.124) follows from (2.5), and (2.126) follows from (2.7). 78 2.7.2 Proof of Theorem 1 Theorem 1: For a two-user system with independent channels evolving over time according to an ON/OFF Markov chain with transition probabilities p and q, and probing epochs fixed at intervals of k slots, then for each channel probe, the total reward from probing channel 1 is equal to that of probing channel 2. Proof of Theorem 1. This proof uses reverse induction on the probing index n. As a base case, consider n = N . JN1 (x1 , x2 ) = f k (x1 , x2 ) = f k (x2 , x1 ) = JN2 (x1 , x2 ) (2.127) 1 2 Now assume Jn+1 (x1 , x2 ) = Jn+1 (x1 , x2 ), and we prove this holds for index n. First, we note that the function f k (x1 , x2 ) is affine in both x1 and x2 . To see this, consider 0 ≤ λ ≤ 1. λf k (a, x2 ) + (1 − λ)f k (b, x2 ) k−1 X i i i i = λap11 + λ(1 − a)τ (x2 ) + (1 − λ)bp11 + (1 − λ)(1 − b)τ (x2 ) = i=0 k−1 X pi11 (λa + (1 − λ)b) + τ (x2 ) λ(1 − a) + (1 − λ)(1 − b) i (2.128) (2.129) i=0 = f k (λa + (1 − λ)b, x2 ) (2.130) As a consequence of Lemma 1, it also follows that λf k (x2 , a) + (1 − λ)f k (x1 , b) = f k (x1 , λa + (1 − λ)b) (2.131) 1 2 Using the above fact, we can show that both Jn+1 and Jn+1 are affine as well. 1 1 (a, x2 ) + (1 − λ)Jn+1 (b, x2 ) λJn+1 k k k k k = λf (a, x2 ) + λ aJn+2 (p11 , τ (x2 )) + (1 − a)Jn+2 (p01 , τ (x2 )) 79 k k k k + (1 − λ)f (b, x2 ) + (1 − λ) bJn+2 (p11 , τ (x2 )) + (1 − b)Jn+2 (p01 , τ (x2 )) k (2.132) = f k (λa + (1 − λ)b, x2 ) + (λa + (1 − λ)b)Jn+2 (pk11 , τ k (x2 )) + (1 − λa − (1 − λ)b)(pk01 , τ k (x2 )) (2.133) 1 (λa + (1 − λ)b, x2 ) = Jn+1 (2.134) 1 2 By the induction hypothesis, since Jn+1 (x1 , x2 ) = Jn+1 (x2 , x1 ), it is easy to show that 2 Jn+2 is affine in x2 as well. Using the results above, Jn1 (x1 , x2 ) is written as Jn1 (x1 , x2 ) = f k (x1 , x2 ) + x1 Jn+1 (pk11 , τ k (x2 )) + (1 − x1 )Jn+1 (pk01 , τ k (x2 )) (2.135) 1 1 = f k (x1 , x2 ) + x1 Jn+1 (pk11 , τ k (x2 )) + (1 − x1 )Jn+1 (pk01 , τ k (x2 )) (2.136) 1 = f k (x1 , x2 ) + Jn+1 (τ k (x1 ), τ k (x2 )) (2.137) 2 = f k (x1 , x2 ) + Jn+1 (τ k (x1 ), τ k (x2 )) (2.138) 2 2 = f k (x2 , x1 ) + x2 Jn+1 (τ k (x1 ), pk11 ) + (1 − x2 )Jn+1 (τ k (x1 ), pk01 ) (2.139) = f k (x2 , x1 ) + x2 Jn+1 (τ k (x1 ), pk11 ) + (1 − x2 )Jn+1 (τ k (x1 ), pk01 ) (2.140) = Jn2 (x1 , x2 ) (2.141) where equations (2.136), (2.138), and (2.140) follow from the induction hypothesis, i and equations (2.137) and (2.139) use the affinity of Jn+1 , and Lemma 1. 2.7.3 Proof of Theorem 3 Theorem 3: For a two-user system with channel states evolving as described above, and probing instances fixed to intervals of k slots, if p1 , p2 , q1 , q2 satisfy bi11 ≥ ai11 80 ∀i, (2.142) then, the optimal probing strategy is to probe channel 2 at all probing instances. Proof of Theorem 3. This proof will use induction on the horizon length of the corresponding DP problem. Define state transition functions τ1i (x) = ai11 x + (1 − x)ai01 (2.143) τ2i (x) = bi11 x + (1 − x)bi01 (2.144) Base Case: Assume n = N . For this immediate-reward problem, the expected reward functions simplify to the following: JN1 (x1 , x2 ) JN2 (x1 , x2 ) = = k−1 X i=0 k−1 X x1 max(ai11 , τ2i (x2 )) x2 max(bi11 , τ1i (x1 )) + (1 − x1 ) max(ai01 , τ2i (x2 )) + (1 − x2 ) max(bi01 , τ1i (x1 )) (2.145) (2.146) i=0 Since we have assumed that bi11 ≥ ai11 , the following inequalities hold: bi1,1 ≥ ai1,1 ≥ τ1i (x1 ) (2.147) bi0,1 ≤ ai0,1 ≤ τ1i (x1 ) Consequently, we can rewrite (2.146) as Jn2 (x1 , x2 ) = = k−1 X i=0 k−1 X x2 bi11 + (1 − x2 )τ1i (x1 ) x2 bi11 + (1 − x2 )x1 ai11 + (1 − x2 )(1 − x1 )ai01 i=0 We consider two separate cases depending on if x2 ≥ π2 or x2 < π2 .. Case 1: x2 ≥ π2 . Equation (2.145) simplifies to 81 (2.148) JN1 (x1 , x2 ) k−1 X i i i = x1 max(a11 , τ2 (x2 )) + (1 − x1 )τ2 (x2 ) = i=0 k−1 X x1 max(ai11 , τ2i (x2 )) + (1 − x1 )x2 bi11 + (1 − x1 )(1 − x2 )bi01 i=0 (2.149) k−1 X i i i i i = x1 max(a11 , τ2 (x2 )) + x2 b11 − x1 x2 b11 + (1 − x1 )(1 − x2 )b01 i=0 (2.150) = JN2 (x1 , x2 ) k−1 X + x1 max(ai11 , τ2i (x2 )) − x1 x2 bi11 + (1 − x1 )(1 − x2 )bi01 i=0 − (1 − ≤ JN2 (x1 , x2 ) x2 )x1 ai11 − (1 − x2 )(1 − x1 )ai01 k−1 X i i i i + x1 max(a11 , τ2 (x2 )) − x1 x2 b11 − x1 (1 − x2 )a11 (2.151) (2.152) i=0 = JN2 (x1 , x2 ) + k−1 X max x1 ai11 − x1 x2 bi11 − (1 − x2 )x1 ai11 , i=0 x1 τ2i (x2 ) = JN2 (x1 , x2 ) + − k−1 X x1 x2 bi11 − (1 − x2 )x1 ai11 (2.153) i i i i max x1 x2 (a11 − b11 ), x1 (1 − x2 )(b01 − a11 ) (2.154) i=0 ≤ JN2 (x1 , x2 ) (2.155) In the above, (2.151) follows from subtracting (2.148), (2.152) follows from bi01 ≤ ai01 , and (2.155) follow from bi11 ≥ ai11 . Case 2: x2 ≤ π2 . Equation (2.145) simplifies to JN1 (x1 , x2 ) = k−1 X x1 ai11 + (1 − x1 ) max(ai01 , τ2i (x2 )) i=0 82 k−1 X i i i i i = x1 a11 + (1 − x1 ) max(a01 , τ2 (x2 )) + (1 − x2 )x1 a11 − (1 − x2 )x1 a11 i=0 (2.156) k−1 X i i i i = x1 x2 a11 + (1 − x1 ) max(a01 , τ2 (x2 )) + (1 − x2 )x1 a11 (2.157) i=0 = JN2 (x1 , x2 ) k−1 X + x1 ai11 x2 + (1 − x1 ) max(ai01 , τ2i (x2 )) i=0 − = x2 bi11 JN2 (x1 , x2 ) + − (1 − x2 )(1 − k−1 X x1 )pi01 (2.158) max x2 (x1 ai11 + (1 − x1 )ai01 ) − x2 bi11 , i=0 x1 x2 (ai11 − bi11 ) + (1 − x1 )(1 − x2 )(bi01 ≤ JN2 (x1 , x2 ) − ai01 ) (2.159) (2.160) Where (2.158) results from applying (2.148), and (2.160) comes from the assumption that ai11 ≤ bi11 . Inductive Step: Assume that Jl1 (x1 , x2 ) ≤ Jl2 (x1 , x2 ), for all n+1 ≤ l ≤ N , we now prove that Jn1 (x1 , x2 ) ≤ Jn2 (x1 , x2 ). Therefore, the optimal cost to go Jn+1 (x1 , x2 ) = 2 (x1 , x2 ). By combining expressions (2.145) and (2.146) with the induction hyJn+1 pothesis, it follows that k−1 X i i i i x1 max(a11 , τ2 (x2 )) + (1 − x1 ) max(a01 , τ2 (x2 )) i=0 k−1 X i i i i ≤ x2 max(b11 , τ1 (x1 )) + (1 − x2 ) max(b01 , τ1 (x1 )) . i=0 (2.161) To conclude the proof: 2 2 (ak11 , τ2k (x2 )) + (1 − x1 )Jn+1 (ak01 , τ2k (x2 )) x1 Jn+1 2 = Jn+1 (τ1k (x1 ), τ2k (x2 )) 83 (2.162) 2 2 = x2 Jn+1 (τ1k (x1 ), ak11 ) + (1 − x2 )Jn+1 (τ1k (x1 ), ak01 ) (2.163) Where the above comes from the affinity of the function Jn (x1 , x2 ), shown in (2.132)-(2.132). Thus, combining (2.161) with (2.163) proves the theorem. 2.7.4 Proof of Lemmas 2 and 3 Lemma 8. Let g(x, y) be any function satisfying g(x, y) = ax + by + cxy + d for some constants a, b, c, d. Then, g(x, y) − g(y, x) = (x − y)(g(1, 0) − g(0, 1)) (2.164) Proof. g(x, y) − g(y, x) = ax + by + cxy + d − ay − bx − cyx − d = (x − y)(a − b) = (x − y)(g(1, 0) − g(0, 1)) Lemma 2: If x1 ≥ x2 ≥ x3 , then for all 0 ≤ n ≤ N , Wn (x1 , x2 , x3 ) ≥ Wn (x2 , x1 , x3 ) Proof of Lemma 2. The proof follows by reverse induction on the probing index n. For time n = N , WN (x1 , x2 , x3 ) − WN (x2 , x1 , x3 ) = f k (x1 , x2 ) − f k (x2 , x1 ) = 0 (2.165) The above equality follows from Lemma 1. Assume the inductive hypothesis holds 84 for n + 1: Wn (x1 , x2 , x3 ) − Wn (x2 , x1 , x3 ) = (x1 − x2 )(Wn (1, 0, x3 ) − Wn (0, 1, x3 )) (2.166) = (x1 − x2 )(f k (1, 0) + Wn+1 τ k (1), τ k (x3 ), τ k (0) − f k (0, 1) − Wn+1 τ k (1), τ k (0), τ k (x3 ) (2.167) = (x1 − x2 )(Wn+1 τ k (1), τ k (x3 ), τ k (0) − Wn+1 τ k (1), τ k (0), τ k (x3 ) (2.168) ≥ (x1 − x2 )(Wn+1 τ k (1), τ k (0), τ k (x3 ) − Wn+1 τ k (1), τ k (0), τ k (x3 ) = 0 (2.169) The inequality in (2.169) holds by the induction hypothesis of Lemma 3. Lemma 3: If x1 ≥ x2 ≥ x3 , then for all 0 ≤ n ≤ N , Wn (x1 , x2 , x3 ) ≥ Wn (x1 , x3 , x2 ) Proof of Lemma 3. The proof follows by reverse induction on the probing index n. For time n = N , WN (x1 , x2 , x3 ) − WN (x1 , x3 , x2 ) = f k (x1 , x2 ) − f k (x1 , x3 ) = (x2 − x3 ) k−1 X (2.170) pi11 − τ i (x1 ) (2.171) i=0 = (x2 − x3 )(1 − x1 ) k−1 X (pi11 − pi01 ) ≥ 0 (2.172) i=0 where the inequality follows from the positive memory assumption on the channel. Now we assume the inductive hypothesis for Lemmas 2 and 3 hold for times at and after n. Wn (x1 , x2 , x3 ) − Wn (x1 , x3 , x2 ) = (x2 − x3 ) Wn (x1 , 1, 0) − Wn (x1 , 0, 1) = (x2 − x3 )(f k (x1 , 1) + Wn+1 τ k (1), τ k (x1 ), τ k (0) − f k (x1 , 0) − Wn+1 τ k (x1 ), τ k (1), τ k (0) 85 (2.173) (2.174) k k k k k k ≥ (x2 − x3 ) Wn+1 τ (1), τ (x1 ), τ (0) − Wn+1 τ (x1 ), τ (1), τ (0) (2.175) k k k k k k ≥ (x2 − x3 ) Wn+1 τ (1), τ (x1 ), τ (0) − Wn+1 τ (1), τ (x1 ), τ (0) = 0 (2.176) The inequality in (2.175) follows from (2.170) - (2.172). The inequality in (2.176) follows from the inductive hypothesis of Lemma 2. 86 Chapter 3 Opportunistic Scheduling with Limited Channel State Information: A Rate Distortion Approach Consider a transmitter and a receiver connected by two independent channels. The state of each channel is either ON or OFF, where transmissions over an ON channel result in a unit throughput, and transmissions over an OFF channel fail. Channels evolve over time according to a Markov process. At the beginning of each time slot, the receiver measures the channel states in the current slot, and transmits channel state information (CSI) to the transmitter. Based on the CSI sent by the receiver, the transmitter chooses over which of the channels to transmit. In a system in which an ON channel and OFF channel are equally likely to occur, the transmitter can achieve an expected per-slot throughput of a per-slot throughput of 3 4 1 2 without CSI, and if the transmitter has full CSI before making scheduling decisions. However, the transmitter does not need to maintain complete knowledge of the channel state in order to achieve high throughput; it is sufficient to only maintain knowledge of which channel has the best state. Furthermore, the memory in the system can be used to further reduce the required CSI. We are interested in the minimum rate that CSI must be sent to the transmitter in order to guarantee a lower bound on expected throughput. This quantity represents a fundamental limit on the 87 overhead information required in this setting. The above minimization can be formulated as a rate distortion optimization with an appropriately designed distortion metric. In particular, the rate distortion function provides a lower bound on the rate at which the transmitter must obtain CSI in order to satisfy an average throughput constraint. The opportunistic communication framework, in contrast to traditional rate distortion, requires that the CSI sequence be causally encoded, as the receiver observes the channel states in real time. Consequently, restricting the rate distortion problem to causal encodings provides a tighter lower bound on the required CSI that must be provided to the transmitter. Opportunistic scheduling is one of many network control schemes that require network state information (NSI) in order to make control decisions. The performance of these schemes is directly influenced by the availability and accuracy of this information. However, the overhead required to convey this information is often ignored, leading to inaccurate performance guarantees. If the network state changes rapidly, there are more possibilities to take advantage of an opportunistic performance gain, albeit at the cost of additional overhead. For large networks, this overhead becomes prohibitive. Therefore, it is increasingly important to quantify the minimum amount of information that must be conveyed in order to implement efficient network control, as well as investigate schemes for controlling a network with limited feedback information. This chapter presents a novel rate distortion formulation to quantify the fundamental limit on the rate of overhead required for opportunistic scheduling.1 . We design a new distortion metric for this setting that captures the impact of the availability of CSI on network performance, and incorporate a causality constraint to the rate distortion formulation to reflect practical constraints of a real-time communication system. We analytically compute a closed-form expression for the causal rate distortion lower bound for a two-channel system. Additionally, we propose a practical encoding algorithm to achieve the required throughput with limited overhead. Moreover, we show that for opportunistic scheduling, there is a fundamental gap between 1 An earlier version of this work appeared in [38]. 88 the mutual information and entropy-rate-based rate distortion functions, and discuss scenarios under which this gap vanishes. The remainder of this chapter is outlined as follows. The problem is formally presented in Section 3.1. Section 3.2 contains our main result, the formulation and solution to the causal information rate distortion problem. In Section 3.3, we present an algorithm for encoding the channel state information and quantify its performance. Lastly, in Section 3.4, we analyze the gap between the operational and information rate distortion functions. 3.1 System Model Consider a transmitter and a receiver, connected through M independent channels, as shown in Figure 3-1. Assume a time-slotted system, where at time-slot t, each channel has a time-varying channel state Si (t) ∈ {OFF, ON}, independent from all other channels. The notation Si (t) ∈ {0, 1} is used interchangeably. S1 S2 TX RX SM Figure 3-1: System Model: A transmitter and receiver connected by M independent channels. Let X(t) = Xt = {S1 (t), S2 (t), . . . , SM (t)} represent the system state at time slot t. At each time slot, the transmitter chooses a channel over which to transmit, with the goal of opportunistically transmitting over an ON channel. Channel states evolve over time according to a Markov process described by the chain in Figure 3-2, with transition probabilities p and q satisfying 1−p−q ≥ 0, corresponding to channels with “positive memory.” A channel with positive memory is more likely to be ON if it was ON at the previous time, than if it was OFF. Let π be the steady state probability of the channel being in the ON state. For this channel state model, π = 89 p . p+q p 1−p OFF 1−q ON q Figure 3-2: Markov chain describing the channel state evolution of each independent channel. X(t) TX RX Z(t) Figure 3-3: Information structure of an opportunistic communication system. The receiver measures the channel state X(t), encodes this into a sequence Z(t), and transmits this sequence to the transmitter. The transmitter does not observe the state of the system. Instead, the receiver causally encodes the sequence of channel states X1n into the sequence Z1n and sends the encoded sequence to the transmitter, as illustrated in Figure 3-3, where X1n is used to denote the vector of random variables [X(1), . . . , X(n)]. The encoding Z(t) = Zt ∈ {1, . . . , M } represents the index of the channel over which to transmit. Since the throughput-optimal transmission decision is to transmit over the channel with the best state, it is sufficient for the transmitter to restrict its knowledge to the index of the channel with the best state at each time. We assume that the feedback of the index Z(t) happens instantaneously (with zero delay), and the objective is to minimize the information rate over the feedback link. The expected throughput earned in slot t is E[thpt(x(t), z(t))] = SZ(t) (t), since the transmitter uses channel i = Z(t), and receives a throughput of 1 if that channel is ON, and 0 otherwise. Clearly, a higher throughput is attainable with more accurate CSI. We define a distortion measure between a sequence of channel states xn1 and an 90 encoded sequence z1n to measure the quality of the encoding. The average distortion between the sequences xn1 and z1n is defined in terms of the per-letter distortion, n d(xn1 , z1n ) = 1X d(xi , zi ), n i=1 (3.1) where d(xi , zi ) is the per-letter distortion between the ith source symbol and the ith encoded symbol at the transmitter. Traditionally, the distortion metric measures the distance between the two sequences. Common examples are a Hamming distortion metric, which measures the probability of error between two sequences, and the meansquared error distortion metric. In the opportunistic communication setting, these traditional distortion metrics are inappropriate, since the transmitter does not need to know the channel state of each of the channels, but rather which channel yields the highest transmission rate. Thus, for the opportunistic communication framework, the per-letter distortion is defined as d(xi , zi ) , 1 − E[thpt(t)] = 1 − SZ(t) (t), (3.2) where SZ(t) is the state of the channel indexed by Z(t). This definition quantifies the loss in throughput by transmitting over channel Zi . Consequently, an upper bound on expected distortion translates to a lower bound on expected throughput. 3.1.1 Problem Formulation The goal in this chapter is to determine the minimum rate that CSI must be conveyed to the transmitter to achieve a lower bound on expected throughput. In this setting, CSI must be conveyed to the transmitter casually, in other words, the ith encoding can only depend on the channel state at time i, and previous channel states and encodings. Let Qc (D) be the family of causal encodings q(z1n |xn1 ) satisfying E[d(xn1 , z1n )] = XX xn 1 p(xn1 )q(z1n |xn1 )d(xn1 , z1n ) ≤ D, z1n 91 (3.3) where p(xn1 ) is the PDF of the source, and the causality constraint: q(z1i |xn1 ) = q(z1i |y1n ) ∀xn1 , y1n s.t. xi1 = y1i . (3.4) Mathematically, the minimum rate that CSI must be transmitted is given by 1 H(Z1n ), n→∞ q∈Qc (D) n RcN G (D) = lim where 1 H(Z1n ) n inf (3.5) is the entropy rate of the encoded sequence in bits. Equation (3.5) is the causal rate distortion function, as defined by Neuhoff and Gilbert [51], and is denoted using the superscript NG. Here, the rate distortion function is defined as a minimization of entropy rate rather than a minimization of mutual information, as in [24, 60, 63], which is discussed in Section 3.2. The decision to formulate this problem as a minimization of entropy rate is based on the intuition that the entropy rate captures the average number of bits per channel use required to convey channel state information. 3.1.2 Previous Work As mentioned in Section 1.2.2, several works have used rate distortion-based approaches to characterize limits on required control information. While the traditional rate-distortion problem has been well studied [6], there have been several works extending these results to Markov Sources [7, 25]. In [34], the authors develop bounds on the rate distortion function, by assuming every k th source symbol is transmitted noiselessly to the receiver, and using the fact that given those symbols, the rest of the source symbols can be viewed as independent blocks. The lower bounds presented in [34] can be arbitrarily tight, but at the cost of exponentially increasing computational complexity. Additionally, researchers have considered the causal source coding problem due to its application to real-time processing. One of the first works in this field was [51], in which Neuhoff and Gilbert show that the best causal encoding of a memoryless 92 source is a memoryless coding, or a time sharing between two memoryless codes. However, this result pertains to sources without memory. Neuhoff and Gilbert focus on the minimization of entropy rate, as in (3.5). The work in [68] studied the optimal finite-horizon sequential quantization problem, and showed that the optimal encoder for a k th -order Markov source depends on the last k source symbols and the present state of the decoder’s memory (i.e. the history of decoded symbols). A similar result was shown in [66] for an infinite horizon sequential quantization problem. Later, a causal (sequential) rate distortion theory was introduced in [9] and [63] for general stationary sources. They show that the sequential rate distortion function lower bounds the entropy rate of a causally encoded sequence, but this inequality is strict in general. Despite this, operational significance for the causal rate distortion function is developed in [63]. Lastly, [60] studies the causal rate distortion function as a minimization of directed mutual information, and computes the form of the optimal causal stochastic kernels. 3.2 Rate Distortion Lower Bound To begin, we review the traditional rate distortion problem. Then, we extend this formulation by defining the causal information rate distortion function, which is a minimization of mutual information, and is known to lower bound RcN G (D) [9]. The causal information rate distortion function provides a lower bound on the required rate at which CSI must be conveyed to the transmitter to meet the throughput requirement. 3.2.1 Traditional Rate Distortion Consider the well known rate distortion problem, in which the goal is to find the minimum number of bits per source symbol necessary to encode a sequence of source symbols while meeting a fidelity constraint. Consider a discrete memoryless source {Xi }∞ i=1 , where each Xi is an i.i.d. random variable taking values in the set X , according to distribution pX (x). This source sequence is encoded into a sequence 93 {Zi }∞ i=1 , with Zi taking values in Z. The distortion between a block of source symbols and encoded symbols is defined as N d(xN 1 , z1 ) N 1 X = d(xi , zi ), N i=1 (3.6) where d(xi , zi ) is the per-letter distortion between the source symbol xi and encoded symbol zi . Define Q(D) to be the family of conditional probability distributions q(z|x) satisfying E[d(x, z)] = XX pX (x)q(z|x)d(x, z) ≤ D. (3.7) x∈X z∈Z Shannon’s rate-distortion theory [15] states that the minimum rate R at which the source can be encoded with average distortion less than D is given by the information rate distortion function R(D), where R(D) , min I(X; Z), (3.8) q(z|x)∈Q(D) and I(·; ·) represents mutual information. The rate-distortion function satisfies 1 H(Z1n ) = R(D), q(z|x)∈Q(D) n min (3.9) implying that the encoded sequence can be compressed to an average of R(D) bits per symbol. In other words, the problem of minimizing entropy rate can be solved as a minimization of mutual information, which is known to be an easier problem, since the formulation in (3.8) is convex. 3.2.2 Causal Rate Distortion for Opportunistic Scheduling Now, the previous formulation is extended to the causal setting for the opportunistic communication problem described in Section 3.1. As discussed above, the information rate distortion function is a minimization of mutual information over all stochastic 94 kernels satisfying a distortion constraint. For opportunistic scheduling, this minimization is further constrained to include only causal kernels. Let Qc (D) be the set of all stochastic kernels q(z1n |xn1 ) satisfying the expected distortion constraint in (3.3) and the causality constraint in (3.4). The causal information rate distortion function is defined as Rc (D) , lim inf n→∞ q(z1n |xn 1 )∈Qc (D) 1 I(X1n ; Z1n ). n (3.10) This definition is the same as that found in [24, 63], as well as in [60] where it is referred to as a non-anticipatory rate distortion function. As mentioned previously, the function Rc (D) is a lower bound on the Neuhoff-Gilbert rate distortion function RcN G (D) in (3.5), and hence a lower bound on the rate of CSI that needs to be conveyed to the transmitter to ensure expected per-slot throughput is greater than 1 − D. In the traditional (non-causal) rate distortion framework, this bound is tight; however, in the causal setting, the minimization of mutual information is potentially very different than the minimization of entropy rate. This is explored further in Section 3.4. Note that for memoryless sources, Rc (D) = R(D), where R(D) is the traditional rate distortion function; however, for most memoryless sources, R(D) < RcN G (D). The optimization problem in (3.10) is solved using a geometric programming dual as in [13]. The following result gives the structure of the optimal stochastic kernel. Note that this result is also obtained in [60]. Theorem 14. The optimal kernel q(z1n |xn1 ) satisfies Q(zi |z1i−1 ) exp(−λd(xi , zi )) i−1 zi Q(zi |z1 ) exp − λd(xi , zi ) q(zi |z1i−1 , xi1 ) = P (3.11) where for all z1i , Q(zi |z1i−1 ) and λ satisfy 1= X xn 1 P P (xn1 ) exp − ni=1 λd(xi , zi ) Qn P i−1 i=1 zi Q(zi |z1 ) exp − λd(xi , zi ) (3.12) The proof of Theorem 14 is given in the Appendix. Equation (3.12) holds for all encodings z1n , and gives a system of equations from which one can solve for Q(zi |z1i−1 ). 95 This holds in general for any number of Markovian channels, and can be numerically solved to determine Rc (D). Observe in (3.11) that q(zi |z1i−1 , xi1 ) = q(zi |z1i−1 , xi ). In other words, the solution to the rate distortion optimization is a distribution which generates Zi depending on the source sequence only through the current channel state. This result follows from the Markov property of the channel state sequence. 3.2.3 Analytical Solution for Two-Channel System While Theorem 14 provides a framework to numerically calculate the causal information rate distortion function, for simple problem settings, Rc (D) can be analytically characterized. Consider the system in Figure 3-1 with two channels (M = 2), where each channel’s evolution over time follow the Markov chain in Figure 3-2. Assume the Markov chain is symmetric, i.e. p = q, although a similar analysis holds without this assumption. In this setting, we can obtain a closed-form expression for the causal information rate distortion function. Theorem 15. For the aforementioned system, the causal information rate distortion function is given by Rc (D) = 21 Hb 2p − 4pD + 2D − for all D satisfying 1 4 1 2 − 12 Hb 2D − 1 2 (3.13) ≤ D ≤ 12 . The proof of Theorem 15 is given in the Appendix, and follows from evaluating (3.11) and (3.12) for a two channel system, and showing the stationarity of the optimal kernel using the following Lemma, also proved in the Appendix. Lemma 9. The optimal values of Q(zi |z1i−1 ) = Q(zi |zi−1 ) for all 1 ≤ i ≤ n. Furthermore, for all i, Q(zi |zi−1 ) = 1 − p zi = zi−1 p zi 6= zi−1 (3.14) This lemma shows that the optimal distributions q(zn |zn−1 , xn ) and Q(zn |zn−1 ) are stationary, and the rate distortion problem can be solved as a minimization over 96 Figure 3-4: Causal information rate distortion function for different state transition probabilities p for a two channel opportunistic scheduling system. a single letter. This result only holds for the two-channel opportunistic scheduling problem. The information rate distortion function in (3.60) is a lower bound on the rate that information needs to be conveyed to the transmitter. A distortion Dmin = represents a lossless encoding, since for a fraction 1 4 1 4 of the time slots, both channels are OFF, and no throughput can be obtained. Additionally, Dmax = 1 2 corresponds to an oblivious encoder, as transmitting over an arbitrary channel requires no CSI, and achieves distortion equal to 12 . The function Rc (D) is plotted in Figure 3-4 as a function of D. As the memory in the channel state process increases (state transition probability p decreases), the required overhead rate decreases as the transmitter needs less information to accurately estimate the state of the channels. 3.3 Heuristic Upper Bound The causal information rate distortion function Rc (D) computed in the previous section provides a lower bound on the Neuhoff-Gilbert rate distortion function. To quantify the tightness of the bound, we propose an algorithmic upper bound to RcN G in (3.5). For simplicity, assume that p = q, and that M = 2, i.e. the transmitter has 97 1 1 2 2 2 2 2 2 1 1 1 1 1 1 1 ? t − 15 t − 10 t−5 t K(t) Figure 3-5: Definition of K, the time since the last change in the sequence Z(t), with respect to the values of Z(t) up to time t. two symmetric channels over which to transmit. Therefore, X(t) ∈ {00, 01, 10, 11}. Observe that when X(t) = 11, no distortion is accumulated regardless of the encoding Z(t), and a unit distortion is always accumulated when X(t) = 00. The minimum possible average distortion is Dmin = fraction 3.3.1 1 4 1 , 4 since the state of the system is 00 for a of the time. Minimum Distortion Encoding Algorithm To begin, we present a casual encoding of the state sequence that achieves minimum distortion. Recall that a causal encoder f (·) satisfies Z(t) = f (X1t , Z1t−1 ). Consider the following encoding policy: Z(t − 1) if X(t) = 00 or X(t) = 11 Z(t) = 1 if X(t) = 10 2 if X(t) = 01 (3.15) Note that Z(t) is a function of Z(t−1) and X(t), and is therefore a causal encoding as defined in (3.4). The above encoding achieves a minimum expected distortion equal to 14 . Note that the transmitter does not learn the complete channel state through this encoding, but conveying full CSI requires additional rate at no further reduction to distortion. Let K(i) be a random variable denoting the number of time slots since the last change in the sequence Z(i), i.e., K(i) = min{j < i|Z(i − j) 6= Z(i − j − 1)}. j (3.16) Since Z(i) ∈ {1, 2}, K(i) is interpreted as the length of the current run of ones 98 or twos in the encoded sequence, as illustrated in Figure 3-5. Thus, at each time, the transmitter is able to infer the state of the system up to K(i) slots ago. Since the channel state is Markovian, the entropy rate of the sequence Z1∞ is expressed as n 1 1X lim H(Z1n ) = lim H(Z(i)|Z1i−1 ) n→∞ n n→∞ n i=1 (3.17) = H(Z(i)|Z(i − 1), K(i)) (3.18) = ∞ X P(K = k)Hb (P(Z(i) 6= Z(i − 1)|K = k)) (3.19) k=1 where Hb (·) is the binary entropy function. Note by definition, Z(i − 1) = Z(i − K(i)) in (3.18). Equation (3.19) can be computed numerically in terms of the transition probabilities of the Markov chain in Figure 3-2. 3.3.2 Threshold-based Encoding Algorithm In order to further reduce the rate of the encoded sequence from that of the encoder in (3.15), a higher expected distortion is required. We now introduce a new algorithm by introducing a parameter T , and modifying the encoding algorithm in (3.15) as follows: If K(i) ≤ T , then Z(i) = Z(i − 1), and if K(i) > T , then Z(i) is assigned according to (3.15). As a result, for the first T slots after the Z(i) sequence changes value, the transmitter can determine the next element of the sequence deterministically, and hence the sequence is encoded with zero rate. After T slots, the entropy in the Z(i) process is similar to that of the original encoding algorithm. As expected, this reduction in entropy rate comes at an increase in distortion. In the first T slots after a change to Z(i) = 1, every visit to state X(i) = 01 or X(i) = 00 incurs a unit distortion. Therefore, the accumulated distortion is equal to the number of visits to those states in an interval of T slots. Clearly, as the parameter T increases, the entropy rate decreases, and the expected distortion increases. Consequently, T parameterizes the rate-distortion curve; however, due to the integer restriction, only a countable number of rate-distortion pairs are achievable by varying T . To generate the full R-D curve, time sharing is 99 used to interpolate between the points parameterized by T . An example curve is shown in Figure 3-6, for p = q = 0.2. Note that as T increases, the corresponding points on the R(D) curve become more dense. Furthermore, for the region of R(D) parameterized by large T , the function R(D) is linear. The slope of this linear region is characterized by the following result. Proposition 4. Let R(T ) and D(T ) denote the rate and expected distortion as functions of the parameter T respectively. For large T , the achievable R(D) curve for the above encoding algorithm, denoted by the points (D(T ), R(T )) has slope −H(M ) R(T + 1) − R(T ) = , T →∞ D(T + 1) − D(T ) c + 14 E[M ] lim (3.20) where M is a random variable denoting the expected number of slots after the initial T slots until the Zi sequence changes value, and c is a constant given by T X c= E[1(Xi = 00 or Xi = 01)] i=1 − E[1(Xi = 00 or Xi = 01)|X0 = 10] . (3.21) The proof of Proposition 4 is given in the Appendix. The constant in (3.21) represents the difference in expected accumulated distortion over an interval of T slots between the state processes beginning in steady state and the state process beginning at state X0 = 10. Proposition 4 shows that the slope of R(D) is independent of T for T sufficiently large. As T grows large, the value of Zi changes rarely, and therefore distortion will be accumulated in half of the states. Hence, at zero rate, a distortion of 1 2 is attainable. Figure 3-6 plots the algorithmic upper bound as a function of distortion by varying the parameter T from 1 to 20 and time sharing between these points. Data points are computed using Monte Carlo simulation. Additionally, Figure 3-6 shows the causal information rate distortion function for the same channel transition probabilities for comparison. 100 Lower and Upper Bounds on Required Overhead Rate 0.5 Rc(D) 0.45 Encoding Upper Bound 0.4 0.35 Rate 0.3 0.25 0.2 0.15 0.1 0.05 0 0.25 0.3 0.35 D 0.4 0.45 0.5 Figure 3-6: The causal information rate distortion function Rc (D) (Section 3.2) and the upper bound to the rate distortion function (Section 3.3), computed using Monte Carlo Simulation. Transition probabilities satisfy p = q = 0.2. 3.4 Causal Rate Distortion Gap Figure 3-6 shows a gap between the causal information rate distortion function, and the heuristic upper bound to the Neuhoff-Gilbert rate distortion function computed in Section 3.3. In this section, we prove that for a class of distortion metrics including the throughput metric in (3.2), there exists a gap between the information and NeuhoffGilbert causal rate distortion functions, even at D = Dmin . To illustrate this concept, consider a discrete memoryless source {Xi }, drawing i.i.d. symbols from the alphabet {0, 1, 2}, and an encoded sequence {Zi } drawn from {0, 1, 2}. Consider the following distortion metrics: d1 (x, z) = 1z6=x and d2 (x, z) = 1z=x , where 1 is the indicator function. The first metric d1 (x, z) is a simple Hamming distortion measure, used to minimize probability of error, and the second metric is an inverse Hamming measure. Note that for the second distortion metric, there exist two distortion-free encodings for each source symbol. The causal rate distortion functions Rc (D) for d1 (x, z) and d2 (x, z) are computed using the results of Theorem 14, and 101 the fact that q(zi |z1i−1 , xi ) = q(zi |xi ) due to the memoryless property of the source. Rc,1 (D) = −Hb (D) − D log 32 − (1 − D) log 13 0 ≤ D ≤ 32 , (3.22) Rc,2 (D) = −Hb (D) − D log 13 − (1 − D) log 23 0 ≤ D ≤ 31 . (3.23) Additionally, Neuhoff and Gilbert [51] show that for a memoryless source, RcN G equals the lower convex envelope of all memoryless encoders for this source. The relevant memoryless encoders that lie on the convex envelope are the minimum-rate, zero distortion encoder, and the oblivious encoder, which always outputs the same index, requiring zero rate and accumulating high distortion. Thus, the entropy-rate based rate distortion functions are given by NG Rc,1 (D) = (1 − 32 D) log 3 NG Rc,2 (D) = (1 − 3D)Hb ( 31 ) 0≤D≤ 2 3 0 ≤ D ≤ 13 . (3.24) (3.25) The information and Neuhoff-Gilbert rate distortion functions for the two metrics are plotted in Figure 3-7. Note that for both distortion metrics, the causal rate distortion function is not operationally achievable. Furthermore, in the lossless encoding case (D = Dmin ), there is a gap in between the two rate distortion functions when using the second distortion metric, but not the hamming distortion metric. This gap arises when for a state x, there exist multiple encodings z that can be used with no distortion penalty. This observation is formalized in the following result. Theorem 16. Let {Xi } represent an i.i.d. discrete memoryless source from alphabet X , encoded into a sequence {Zi } taken from alphabet Z, subject to a per-letter distortion metric d(xi , zi ). Furthermore, suppose there exists x1 , x2 , y ∈ X and z1 , z2 ∈ Z, such that z1 6= z2 and a) P(x1 ) > 0, P(x2 ) > 0, P(y) > 0, 102 (a) Distortion d1 (x, z). (b) Distortion d2 (x, z). Figure 3-7: Rate distortion functions for example systems. b) z1 is the minimizer z1 = arg minz d(x1 , z), c) z2 is the minimizer z2 = arg minz d(x2 , z), d) d(y, z1 ) = d(y, z2 ) = minz d(y, z). Then RcN G (Dmin ) > Rc (Dmin ). Proof. By [51], there exists a deterministic function f : X → Z such that RcN G (Dmin ) = H(f (X)) E[d(X, f (X))] = Dmin (3.26) (3.27) Define a randomized encoding q(z|x), where z = f (x) for all x 6= y, and the source symbol y is encoded randomly into z1 or z2 with equal probability. Consequently, H(Z|X) > 0, and H(Z) > Iq (X; Z) under encoding q(z|x). Note that the new encoding also satisfies Eq [d(X, Z)] = Dmin . To conclude, RN G (Dmin ) = H(f (X)) > Iq (X; Z) ≥ R(Dmin ) = Rc (Dmin ) 103 (3.28) Theorem 16 shows that if there exists only one deterministic mapping f : X → Z resulting in minimum distortion, then there will be no gap between the NeuhoffGilbert rate distortion function and the causal information rate distortion function at Dmin . However, when there are multiple deterministic mappings that achieve minimum distortion, a randomized combination of them results in lower mutual information, creating a gap between the minimization of mutual information and the minimization of entropy rate. Note that the throughput distortion metric in (3.2) satisfies the conditions of Theorem 16, since any encoding that returns the index of an ON channel does not incur distortion (results in a successful transmission). While the above result proves that the causal information rate distortion function is not tight, it is still possible to provide an operational interpretation to Rc (D) in (3.60). In [63], the author proves that for a source {Xn,t }, which is Markovian across the time index t, yet i.i.d. across the spatial index n, there exist blocks of sufficiently large t and n such that the causal rate distortion function is operationally achievable, i.e. the information and Neuhoff-Gilbert rate distortion functions are equal. In the opportunistic scheduling setting, this is equivalent to a transmitter sending N messages to the receiver, where each transmission is assigned a disjoint subset of the channels over which to transmit. However, this restriction results in a reduced throughput, as separate transmitters must be restricted from selecting channels belonging to another transmitter’s subset of channels. Relaxing this restriction further improves throughput, but reintroduces the gap between rate distortion functions. 3.5 Application to Channel Probing In Chapter 2, we analyzed optimal probing schemes to maximize throughput in an opportunistic communication system. In this section, channel probing is interpreted as an encoding of CSI to be sent to the transmitter. We compare the information overhead under the channel probing framework to the lower bound found in Section 3.2, to evaluate channel probing as a strategy for acquiring CSI. Consider a system with two channels, where each channel evolves over time ac104 cording to the Markov chain in Figure 2. Assume the transmitter probes one of the channels every T slots, and uses the CSI gathered from channel probes to opportunistically schedule a transmission. A small probing interval T corresponds to a high rate of CSI acquisition at the transmitter, but the frequent availability of CSI leads to a high system throughput. On the other hand, large probing intervals lead to a lower throughput performance, but CSI updates are sent less frequently. Therefore, by computing the entropy rate of the information obtained by the probing process, and the achievable throughput, the probing interval T parameterizes a rate-distortion curve which can be compared with the rate distortion lower bound in Theorem 15. In Chapter 2, it is shown that for fixed probing intervals over a two-channel system, the policy which probes channel 1 at each probe is optimal. As in Section 3.1, let Z(t) be the index of the channel that the transmitter activates. If a probe does not occur at time t, then the transmitter uses the same channel as in the previous slot, as no new CSI has been gathered, so Z(t) = Z(t − 1). On the other hand, if a probe occurs, Z(t) is computed based on the results of the probe. If channel 1 is ON, then Z(t) = 1, and if it is OFF, Z(t) = 2. The entropy rate of the sequence Z1∞ is expressed as n 1 1X lim H(Z1n ) = lim H(Z(i)|Z1i−1 ) n→∞ n n→∞ n i=1 k T −1 1 XX = lim H(Z(ki + j)|Z1ki+j−1 ) k→∞ kT i=1 j=0 (3.29) (3.30) k 1X1 = lim H(Z(ki)|Z1ki−1 ) k→∞ k T i=1 = 1 H(S(T )|S(0)) T (3.31) (3.32) Equation (3.30) follows by breaking the sequence of time slots up into separate channel probes. Equation (3.31) is a simplification based on the fact that only one probe occurs each T slots, and this is the only slot in which CSI is conveyed. Lastly, (3.32) follows since the entropy of Z(t) at each probing instance t is equal to the entropy of the result of the probe, and by the Markov property of the channel state, 105 depends on the past probes only through the result of the previous probe. Equation (3.32) is evaluated using the channel statistics as follows. (1 − π)H S(T )|S(0) = 0 + πH S(T )|S(0) = 1 1 H(S(T )|S(0)) = T T T (1 − π)Hb (p01 ) + πHb (pT11 ) = T (3.33) (3.34) where π is the steady state probably of a channel being ON, pks,1 is the k-step transition probability of the Markov chain, and Hb (·) is the binary entropy function. Using the throughput distortion constraint in (3.2), the expected distortion is given by one minus the expected per-slot throughput. In Chapter 2, it is shown that the average per-slot throughput in this system is given by E[Thpt] = π + πpT10 T (p + q) (3.35) which implies the expected distortion is given by E[D] = (1 − π) − πpT10 T (p + q) (3.36) In summary, the probing interval T parameterizes a rate distortion curve (R(T ), D(T )) given by (R(T ), D(T )) = (1 − π)Hb (pT01 ) + πHb (pT11 ) πpT10 , (1 − π) − T T (p + q) (3.37) In Figure 3-8, this rate distortion curve is plotted along with the causal information rate distortion lower bound and heuristic upper bound in Figure 3-6, yielding a new algorithmic upper bound on the required rate that CSI must be conveyed. The probing policy does not perform as well as the heuristic upper bound in Section 3.3, which suggests that this channel probing policy is not an efficient method of acquiring CSI. However, by the analysis in Chapter 2, it is known that the throughput of probing policies is increased by dynamically optimizing the probing intervals based 106 Lower and Upper Bounds on Required Overhead Rate 0.8 Rc(D) Heuristic Upper Bound Probing Upper Bound 0.7 0.6 Rate 0.5 0.4 0.3 0.2 0.1 0 0.25 0.3 0.35 D 0.4 0.45 0.5 Figure 3-8: Causal information rate distortion lower bound, heuristic upper bound, and probing algorithmic upper bound for a two channel system of p = q = 0.2. on the result of each channel probe. Using the state action frequency approach in Chapter 2 to compute the optimal dynamic probing intervals would lead to an improved performance; however, due to the complexity of dynamically optimizing the probing intervals, deriving an analytic expression for the rate-distortion tradeoff is difficult. 3.6 Summary In this chapter, we considered an opportunistic communication system in which a transmitter selects one of multiple channels over which to schedule a transmission, based on partial knowledge of the network state. We characterized a fundamental limit on the rate that CSI must be conveyed to the transmitter in order to meet a constraint on expected throughput, by modeling the problem as a causal rate distortion optimization of a Markov source. We introduced a novel distortion metric which measures the impact on throughput that a particular CSI encoding has. For the case of a two-channel system, a closed-form expression is derived for the causal information rate distortion lower bound. Furthermore, an algorithmic upper 107 bound is proposed to compare to the lower bound. The gap between the two bounds is characterized, and we proved that this gap is inherent in using a causal encoding for channel state information. Lastly, we characterized the rate-distortion performance of the probing scheme in Chapter 2 to compare to the rate distortion lower bound. 3.7 3.7.1 Appendix Proof of Theorem 14 Theorem 14: The optimal kernel q(z1n |xn1 ) satisfies Q(zi |z1i−1 ) exp(−λd(xi , zi )) i−1 zi Q(zi |z1 ) exp − λd(xi , zi ) q(zi |z1i−1 , xi1 ) = P (3.38) where for all z1i , Q(zi |z1i−1 ) and λ satisfy 1= X xn 1 P P (xn1 ) exp − ni=1 λd(xi , zi ) Qn P i−1 i=1 zi Q(zi |z1 ) exp − λd(xi , zi ) (3.39) Proof of Theorem 14. Any stochastic kernel q(z1n |xn1 ) ∈ Qc (D) can be decomposed as q(z1n |xn1 ) = n Y q(zi |z1i−1 , xi1 ) (3.40) i=1 using the causality of the distribution. The rate-distortion optimization is given by Min. 1 XX p(xn1 )q(z1n |xn1 ) log n xn z n 1 s.t. 1 n X q(z1n |xn1 ) P n n n x̂n p(x̂1 )q(z1 |x̂1 ) (3.41) 1 1 XX d(xi , zi ) ≤ D p(xn1 )q(z1n |xn1 ) n xn z n i=1 1 1 X q(z1i |xi1 ) = 1 ∀i, xi1 , z1i−1 (3.42) (3.43) z1i q(z1i |xi1 ) ≥ 0 ∀i, xi1 , z1i (3.44) The objective in (3.41) is the definition of mutual information. Equation (3.42) is 108 the per-letter average distortion constraint. Equations (3.43) and (3.44) ensure that q(z1n |z1n ) is a valid causal probability distribution. To see this, consider the constraints ordered in time. For i = 1, X q(zi |x1 ) = 1; q(z1 |x1 ) ≥ 0 ∀x1 (3.45) z1 Using an inductive argument, if q(z1i−1 |z1i−1 ) is a valid distribution, then the constraint 1= X q(z1i |xi1 ) = XX q(zi |z1i−1 , xi1 )q(z1i−1 |x1i−1 ) (3.46) zi z i−1 1 z1i = X q(zi |z1i−1 , xi ) (3.47) zi ensures that q(zi |z1i−1 , xi ) and q(z1i |xi1 ) are valid distributions. Thus, Qn i=1 q(zi |z1i−1 , xi ) is a valid distribution. The Lagrangian for the above optimization is derived by relaxing constraint (3.42) with dual variable λ and constraints (3.43) with dual variables µi (xi1 , z1i−1 ): 1 XX q(z1n |xn1 ) n n n L(q, λ, µ ) = p(x1 )q(z1 |x1 ) log P p(x̂n1 )q(z1n |x̂n1 ) n xn zn x̂n 1 1 1 XX n X 1 n n n +λ p(x1 )q(z1 |x1 ) d(xi , zi ) − D n xn z n i=1 i 1 − 1 n XX X i=1 xn 1 µi (xi1 , z1i−1 )(q(z1n |xn1 ) − 1) (3.48) z1n Differentiating equation (3.48) with respect to each q(z1n |xn1 ) and equating to zero yields 1 ∂ L(q, λ, µi ) = p(xn1 ) log n n ∂q(z1 |x1 ) n q(z1n |xn1 ) P n n n x̂n p(x̂1 )q(z1 |x̂1 ) 1 n n X 1X + p(xn1 ) d(xi , zi ) − µi (xi1 , z1i−1 ) = 0 n i=1 i=1 109 (3.49) −nµi (xi1 ,z1i−1 ) p(xn 1) Let αi (xn1 , z1i−1 ) = and let Q(z1n ) = P xn 1 p(xn1 )q(z1n |xn1 ). Solving (3.49) for q(z1n |xn1 ) yields. q(z1n |xn1 ) = Q(z1n ) exp n X i n i−1 − (λd(xi , zi ) + α (x1 , z1 )) (3.50) i=1 From (3.40), n Y q(zi |z1i−1 , xi1 ) = Q(z1n ) exp − i=1 n X (λd(xi , zi ) + α i (xn1 , z1i−1 )) (3.51) i=1 = n Y Q(zi |z1i−1 ) exp(−λd(xi , zi )) exp(−αi (xn1 , z1i−1 )) (3.52) i=1 q(zi |z1i−1 , xi1 ) = Q(zi |z1i−1 ) exp(−λd(xi , zi )) exp(−αi (xn1 , z1i−1 )) (3.53) Summing (3.53) over zi yields 1= X q(zi |z1i−1 , xi1 ) = zi X Q(zi |z1i−1 ) exp(−λd(xi , zi )) exp(−αi (xn1 , z1i−1 )) (3.54) zi α i (xn1 , z1i−1 ) = log X Q(zi |z1i−1 ) exp(−λd(xi , zi )) (3.55) zi Plugging in (3.55) to (3.50) gives an equation for the optimal stochastic kernel q(z1n |xn1 ). Pn n ) exp − λd(x , z ) Q(z i i i=1 q(z1n |xn1 ) = Qn P1 i−1 ) exp − λd(xi , zi ) Q(z |z i 1 i=1 zi (3.56) Q(zi |z1i−1 ) exp(−λd(xi , zi )) =P i−1 Q(z |z ) exp − λd(x , z ) i i i 1 zi (3.57) and q(zi |z1i−1 , xi1 ) To solve for the optimal stochastic kernels, multiply (3.56) by p(xn1 ) and sum over all values of xn1 . 110 X P Q(z1n ) exp − ni=1 λd(xi , zi ) = Qn P i−1 i=1 zi Q(zi |z1 ) exp − λd(xi , zi ) P X P (xn1 ) exp − ni=1 λd(xi , zi ) Qn P 1= i−1 Q(z |z ) exp − λd(x , z ) i i i n 1 i=1 zi x p(xn1 )q(z1n |xn1 ) xn 1 (3.58) (3.59) 1 3.7.2 Proof of Theorem 15 Theorem 15: For the aforementioned system, the causal information rate distortion function is given by Rc (D) = 12 Hb 2p − 4pD + 2D − for all D satisfying 1 4 1 2 − 12 Hb 2D − 1 2 (3.60) ≤ D ≤ 12 . Proof of Theorem 15. From Theorem 14, the optimizing stochastic kernel q(z1n |xn1 ) satisfies (3.11). To begin, consider the conditional distribution of the first symbol in the encoded sequence, z1 . Q(z1 )e−λd(x1 ,z1 ) q(z1 |x1 ) = Q(Z1 = 1)e−λd(x1 ,Z1 =1) + Q(Z1 = 2)e−λd(x1 ,Z1 =2) ∀z1 (3.61) By multiplying both sides by P (x1 ) and summing over all values of x1 , 1= X x1 P (x1 )e−λd(x1 ,z1 ) Q(Z1 = 1)e−λd(x1 ,Z1 =1) + Q(Z1 = 2)e−λd(x1 ,Z1 =2) ∀z1 (3.62) The two equations in (3.62) are solved for the two unknowns Q(Z1 = 1) and Q(Z1 = 2), denoted as Q1 and Q2 respectively. Using the fact that d(xi , zi ) = 1 − szi , and 111 P (X1 ) = 1 4 for all X1 . e−λ 1 e−λ 1 + + = Q1 e−λ + Q2 Q1 + Q2 e−λ Q1 e−λ + Q2 Q1 + Q2 e−λ 1 Q1 = Q2 = 2 (3.63) (3.64) Now, consider the conditional distribution of the first two symbols in the encoded sequence, z1 and z2 . From (3.60), P2 2 Q(z ) exp − λd(x , z ) i i i=1 q(z12 |x21 ) = Q2 P 1 i−1 Q(z |z ) exp − λd(xi , zi ) i 1 i=1 zi P X P (X12 ) exp − 2i=1 λd(xi , zi ) 1= Q2 P i−1 i=1 zi Q(zi |z1 ) exp − λd(xi , zi ) x2 (3.65) (3.66) 1 = X X P (X1 |X2 )e−λd(x1 ,z1 ) P (X2 )e−λd(x2 ,z2 ) P −λd(x2 ,z2 ) −λd(x1 ,z1 ) z2 Q(z2 |z1 )e z1 Q(z1 )e x P x2 (3.67) 1 Let f (x2 , z1 ) be equal to the last term in (3.11), i.e. f (x2 , z1 ) = X P (X1 |X2 )e−λd(x1 ,z1 ) X 2P (X1 |X2 )e−λd(x1 ,z1 ) P = −λd(x1 ,z1 ) e−λd(x1 ,Z1 =1) + e−λd(x1 ,Z1 =2) z1 Q(z1 )e x x 1 (3.68) 1 where the last equality follows from (3.64). The four equations in (3.67) can be solved for the four unknowns Q(z2 |z1 ). First consider Z1 = 1. Let Q(Z2 = i|Z1 = j) = Qi|j X x2 1 −λd(x2 ,Z2 =1) e f (x2 , z1 = 1) 2 −λd(x ,Z =1) 2 2 Q1|1 e + Q2|1 e−λd(x2 ,Z2 =2) = X x2 1 −λd(x2 ,Z2 =2) e f (x2 , z1 = 1) 2 −λd(x ,Z =1) 2 2 Q1|1 e + Q2|1 e−λd(x2 ,Z2 =2) (3.69) Since f (x2 , z1 = 1) = f (x2 , z1 = 2) if x2 = 00 or 11, the above equality simplifies as (e−λ − 1)f (x2 = 01, z1 = 1) (e−λ − 1)f (x2 = 10, z1 = 1) = Q1|1 e−λ + Q2|1 Q1|1 + Q2|1 e−λ (3.70) (1 − p)e−λ + p . 1 −λ (e + 1) 2 (3.71) Using (3.68), f (x2 = 01, z1 = 1) = f (x2 = 10, z1 = 2) = 112 f (x2 = 01, z1 = 2) = f (x2 = 10, z1 = 1) = pe−λ + (1 − p) . 1 −λ (e + 1) 2 (3.72) Combining (3.71) and (3.72) with (3.70), f (x2 = 01, z1 = 1) f (x2 = 10, z1 = 1) = −λ Q1|1 e + Q2|1 Q1|1 + Q2|1 e−λ (1 − p)(1 − e−2λ ) p(1 − e−2λ ) = Q2|1 Q1|1 1 −λ 1 −λ (e + 1) (e + 1) 2 2 pQ1|1 = (1 − p)Q2|1 (3.73) (3.74) (3.75) By plugging (3.75) into (3.67), it follows that Q1|1 = (1 − p) and Q2|1 = p. A similar analysis for the case where Z1 = 2 yields Q1|2 = p and Q2|2 = (1 − p). The above analysis can be used to solve for Q(zi |z1i−1 ) for any i. From (3.67), it follows that 1= X P (Xi |Xi+1 )e−λd(xi ,zi ) P (Xi+1 )e−λd(xi+1 ,zi+1 ) P i −λd(xi+1 ,zi+1 ) −λd(xi ,zi ) zi+1 Q(zi+1 |z1 )e zi Q(zi |zi−1 )e x X P xi+1 i X P (X1 |X2 )e−λd(x1 ,z1 ) P ··· −λd(x1 ,z1 ) z1 Q(z1 )e x (3.76) 1 i Define f (xi+1 1 , z1 ) as i f (xi+1 1 , z1 ) = X P (X1 |X2 )e−λd(x1 ,z1 ) X P (Xi |Xi+1 )e−λd(xi ,zi ) P P · · · −λd(xi ,zi ) −λd(x1 ,z1 ) Q(z |z )e i i−1 z z1 Q(z1 )e i x x (3.77) 1 i Lemma 9: The optimal values of Q(zi |z1i−1 ) = Q(zi |zi−1 ) for all 1 ≤ i ≤ n. Furthermore, for all i, Q(zi |zi−1 ) = 1 − p zi = zi−1 p zi 6= zi−1 (3.78) This Lemma shows that the optimal distributions q(zn |zn−1 , xn ) and Q(zn |zn−1 ) are stationary, and the rate distortion problem can be solved as a minimization over 113 a single letter. It is proved simultaneously with the following lemma. i Lemma 10. The function f (xi+1 1 , z1 ) satisfies i f (xi+1 1 , z1 ) = f (xi+1 , zi ) = X P (Xi |Xi+1 )e−λd(xi ,zi ) P −λd(xi ,zi ) zi Q(zi )e x (3.79) i Proof of Lemma 10. This is an inductive proof on the index i, with the base case (i = 2) provided in (3.71) and (3.72). Now assume f (xi1 , z1i−1 ) X P (Xi−1 |Xi )e−λd(xi−1 ,zi−1 ) P = f (xi , zi−1 ) = −λd(xi−1 ,zi−1 ) zi−1 Q(zi−1 )e x (3.80) i holds for i, and we will prove it holds for i + 1. Note that from (3.80) and (3.79), f (xi = 00, zi−1 ) = 1 (3.81) f (xi = 11, zi−1 ) = 1 (3.82) Q(Zi = 1|zi−1 )e−λ + Q(Zi = 2|zi−1 ) 1/2(e−λ + 1) Q(Zi = 1|zi−1 ) + Q(Zi = 2|zi−1 )e−λ f (xi = 10, zi−1 ) = 1/2(e−λ + 1) f (xi = 01, zi−1 ) = (3.83) (3.84) Therefore, for i + 1, X P (Xi |Xi+1 )e−λd(xi ,zi ) P f (xi , zi−1 ) −λd(xi ,zi ) Q(z |z )e i i−1 z i xi X P (Xi |Xi+1 )e−λd(xi ,zi ) X P (Xi−1 |Xi )e−λd(xi−1 ,zi−1 ) P P = −λd(xi ,zi ) −λd(xi−1 ,zi−1 ) Q(z |z )e i i−1 z zi−1 Q(zi−1 )e i x x i f (xi+1 1 , z1 ) = i (3.85) (3.86) i−1 = P (Xi = 00|Xi+1 ) + P (Xi = 11|Xi+1 ) + P (Xi = 01|Xi+1 )e−λd(01,zi ) f (01, zi−1 ) P (Xi = 10|Xi+1 )e−λd(10,zi ) f (10, zi−1 ) + Q(Zi = 1|Zi−1 )e−λ + Q(Zi = 2|Zi−1 ) Q(Zi = 1|Zi−1 ) + Q(Zi = 2|Zi−1 )e−λ (3.87) = P (Xi = 00|Xi+1 ) + P (Xi = 11|Xi+1 ) + P (Xi = 01|Xi+1 )e−λd(01,zi ) P (Xi = 10|Xi+1 )e−λd(10,zi ) + 1 −λ 1 −λ (e + 1) (e + 1) 2 2 114 (3.88) = X P (Xi |Xi+1 )e−λd(xi ,zi ) P −λd(xi ,zi ) zi Q(zi )e x (3.89) i Where equation (3.86) follows from the induction hypotheses of Lemmas 10 and 9, and (3.88) follows from (3.81)-(3.84). Proof of Lemma 9. The proof follows via induction. The base case, when i = 2, is proven above. Now, assume (3.78) holds for i; we will prove it holds for i + 1. From Lemma 10, (3.76) is rewritten as 1= X = X P P (Xi+1 )e−λd(xi+1 ,zi+1 ) f (xi+1 , zi ) i −λd(xi+1 ,zi+1 ) zi+1 Q(zi+1 |z1 )e (3.90) P X P (Xi |Xi+1 )e−λd(xi ,zi ) P (Xi+1 )e−λd(xi+1 ,zi+1 ) P , i −λd(xi+1 ,zi+1 ) −λd(xi ,zi ) Q(z )e i zi+1 Q(zi+1 |z1 )e z i x (3.91) xi+1 xi+1 i which has the same form as the equation in (3.67), and can be solved using the same method. Thus, Q(Zi+1 |Z1i ) = (1 − p) if Zi = Zi+1 and Q(Zi+1 |Z1i ) = p if Zi 6= Zi+1 . Using Lemma 9, equation (3.56) is equivalent to q(z1n |xn1 ) = n Y i=1 Q(zi |zi−1 ) exp − λd(xi , zi ) P Q(z |z ) exp − λd(x , z ) i i−1 i i zi (3.92) and Q(z |z ) exp − λd(x , z ) i i−1 i i q(zn |z1n−1 , xn ) = q(zn |xn , zn−1 ) = P zi Q(zi |zi−1 ) exp − λd(xi , zi ) (3.93) At the optimal point, the distortion constraint is satisfied with equality. N 1 X 1 XXX D= E[d(xi , zi )] = P (Xi )q(Zi |Xi )d(Xi , Zi ). N N i=1 z x N i=1 i (3.94) i By the stationarity of the optimal decision, q(Zi |Xi ) is given by equation (3.11), q(zi |xi ) = e−λd(xi ,zi ) e−λd(xi ,Zi =1) + e−λd(xi ,Zi =2) 115 (3.95) Substituting this in (3.94) yields e−λ = D − 41 2(D − 14 ) = 3 −D 1 − 2(D − 41 ) 4 (3.96) The dual variable λ is constrained to be positive, occurring when e−λ ≤ 1. Thus, the following only holds for 1 4 ≤ D ≤ 12 . Combining (3.96) with (3.93) and (3.94) yields expressions for q(zi |zi−1 , xi ) and q(zi |xi ) for all zi , xi , zi−1 . Using the stationarity of the solution distribution, 1 XX p(xn1 )q(z1n |xn1 ) log R(D) = n xn z n 1 = 1 XXXX xi−1 zi−1 xi q(z1n |xn1 ) P n n n x̂n p(x̂1 )q(z1 |x̂1 ) (3.97) 1 p(xi )q(zi−1 |xi−1 )p(xi−1 |xi )q(zi |zi−1 , xi ) log zi q(zi |zi−1 , xi ) Q(zi , zi−1 ) (3.98) 1 1 3 1 = Hb 2((1 − p)(D − 4 ) + p( 4 − D)) − Hb 2(D − 4 ) 2 3.7.3 (3.99) Proof of Proposition 4 Proposition 4: Let R(T ) and D(T ) denote the rate and expected distortion as functions of the parameter T respectively. For large T , the achievable R(D) curve for the above encoding algorithm, denoted by the points (D(T ), R(T )) has slope R(T + 1) − R(T ) −H(M ) = , T →∞ D(T + 1) − D(T ) c + 14 E[M ] lim (3.100) where M is a random variable denoting the expected number of slots after the initial T slots until the Zi sequence changes value, and c is a constant given by T X c= E[1(Xi = 00 or Xi = 01)] i=1 116 − E[1(Xi = 00 or Xi = 01)|X0 = 10] (3.101) Proof of Proposition 4. The rate-distortion curve is piecewise linear, with a slope s(T ) as a function of T . s(T ) = R(T + 1) − R(T ) D(T + 1) − D(T ) (3.102) Each of the quantities in (3.102) can be computed using renewal theory. Define a renewal to be the time when the sequence of encoded symbols changes value, i.e. Zi 6= Zi−1 . Let L be a random variable denoting the interval between renewals. Each renewal interval can be broken into two parts: an interval of length T , corresponding to the period in which Zi = Zi−1 deterministically, and an interval of length M , which is a random variable denoting the time after the initial T slots until a renewal occurs. Thus, E[L] = T + E[M ]. (3.103) As T becomes large, the state saturates to the stationary distribution, implying that for asymptotically large T , the distribution of M is independent of T . Let DT be the accumulated distortion over an interval of length T . The expected distortion D(T ) can be written using renewal-reward theory as follows. D(T ) = DT + 41 E[M ] DT + 14 E[M ] = E[L] T + E[M ] (3.104) and DT + 12 + 14 E[M ] D(T + 1) = T + E[M ] + 1 (3.105) Consider the rate of the encoded sequence. Note that the randomness in the sequence Z1L is entirely determined by the length of the renewal interval L. Thus, the rate of the encoded sequence is given by. R(T ) = H(L) H(M ) = . E[L] E[M ] + T 117 (3.106) Recall that for asymptotically large T , M is independent of T , and therefore R(T + 1) = H(M ) H(L + 1) = . E[L + 1] E[M ] + T + 1 (3.107) Combining (3.104)-(3.107) with equation (3.102), s(T ) = H(M ) E[M ]+T +1 DT + 12 + 14 E[M ] T +E[M ]+1 − H(M ) E[M ]+T − DT + 14 E[M ] T +E[M ] (3.108) H(M )(E[M ] + T ) − H(M )(E[M ] + T + 1) (DT + + 14 E[M ])(T + E[M ]) − (DT + 41 E[M ])(T + E[M ] + 1) −H(M ) = 1 T + 14 E[M ] − DT 2 = 1 2 (3.109) (3.110) The accumulated distortion over an interval of length T , DT , is at most a fixed constant less than T2 . While the state process is approaching its steady state distribution, the expected accumulated distribution will be less than steady state is reached. Therefore, T 2 T , 2 and will grow as T 2 after − DT is equal to a fixed constant, denoted as c. Thus, s(T ) = −H(M ) c + 41 E[M ] (3.111) which is a constant with respect to T , implying the rate distortion function is asymptotically linear. 118 Chapter 4 Centralized vs. Distributed: Wireless Scheduling with Delayed CSI In order to schedule transmissions to achieve maximum throughput, a centralized scheduler must opportunistically make decisions based on the current state of each time-varying channel [61]. The channel state of a link can be measured by its adjacent nodes, who forward this channel state information (CSI) across the network to the scheduler. CSI updates can be piggy-backed on top of data transmissions, or sent before data transmissions if time slots are large enough. However, due to the transmission and propagation delays over the wireless links, it may take several time-slots for the scheduler to collect CSI throughout the network, and in that time the network state may have changed. While the majority of works on wireless scheduling assume current CSI is globally available, in practice the available CSI is a delayed view of the network state. Furthermore, the delay in CSI is proportional to the distance of each link from the controller, since CSI updates must traverse the network. Due to the memory in the channel state process, delayed CSI can be used for scheduling; however, the presence of delays results in a lower expected throughput [41]. Centralized scheduling algorithms, in which a central entity makes a scheduling decision for the entire network, yield high theoretical performance, since the central entity uses current CSI throughout the network to compute a globally optimal sched119 ule. However, maintaining current CSI is impractical, due to the latency in acquiring CSI throughout the network. An alternative is a distributed approach, in which each node makes an independent transmission decision based on locally available CSI. These distributed algorithms require no exchange of state information across the network; however, nodes must coordinate to avoid interference. Moreover, local decisions made by distributed algorithms may not achieve a global optimum. As a consequence, distributed algorithms typically achieve only a fraction of the throughput of their ideal centralized counterparts [46]. However, due to delays in gathering CSI for a centralized approach, distributed scheduling may result in a comparatively higher throughput. In this chapter, we propose a new model for delayed CSI to capture the effect of distance on CSI accuracy. Under this framework, nodes have more accurate CSI pertaining to neighboring links, and progressively less accurate CSI for distant links. We demonstrate that a distributed scheduling approach often outperforms the optimal centralized approach with delayed CSI. In doing so, we illustrate the effect that delays in CSI have on the throughput performance of centralized scheduling. We show that as the memory in the channel state process decreases, there exists a distributed policy that outperforms the optimal centralized policy. Additionally, we develop sufficient conditions under which there exists a distributed scheduling policy that outperforms the optimal centralized policy in tree and clique networks, illustrating the impact of topology on achievable throughput. We provide simulation results to demonstrate the performance on different topologies, showing that distributed scheduling eventually outperforms centralized scheduling with delayed CSI. Lastly, we propose a partially distributed scheme, in which a network is partitioned into subgraphs and a controller is assigned to each subgraph. This approach combines elements from centralized and distributed scheduling in order to trade-off between the effects of delayed CSI and the sub-optimality of local decisions. We show that there exists a regime in which this approach outperforms both the fully distributed and centralized approaches. 120 As mentioned in Chapter 1, Ying and Shakkottai study throughput optimal scheduling and routing with delayed CSI and QLI [69]. They show that the throughput optimal policy activates a max-weight schedule, where the weight on each link is given by the product of delayed queue length and the conditional expected channel state given the delayed CSI. Additionally, they propose a threshold-based distributed policy which is shown to be throughput optimal (among a class of distributed policies). In their work, the authors assume arbitrary delays and do not consider the dependence of delay on the network topology. In contrast, by accounting for the relationship between CSI delay and network topology, we are able to effectively compare centralized and distributed scheduling. The remainder of this chapter is organized as follows. In Section 4.1, we present the mathematical model and problem formulation used in this work, elaborating on the structure of delayed CSI, as well as the properties of centralized and distributed algorithms. In Section 4.2, we show that as the memory in the channel state process decreases, distributed scheduling eventually outperforms centralized scheduling. In Sections 4.3 and 4.4, we analytically characterize the expected throughput in tree and clique topologies respectively. We present simulation results comparing centralized and distributed policies in Section 4.5. Lastly, in Section 4.6, we introduce a graph partitioning scheme for binary trees and show that there exists a partially distributed approach that outperforms both the fully centralized and distributed approaches. 4.1 Model and Problem Formulation Consider a network consisting of a set of nodes N , and a set of links (sourcedestination pairs) denoted by L. Time is slotted, and in each slot a set of links is chosen to transmit. This set of activated links must satisfy an interference constraint. In this work, we use a primary interference model, in which each node is constrained to only activate one neighboring link. In other words, the set of activated links forms a matching 1 , as shown in Figure 4-1. 1 A matching is a set of links such that no two links share an endpoint. 121 Figure 4-1: Feasible link activation under primary link interference. Bold edges represent activated links. Each link l ∈ L has a time-varying channel state sl ∈ {0, 1}, and is governed by the Markov Chain in Figure 4-2. The state of the channel at link l represents the rate at which data can be transmitted over that link. A channel state of 0 implies that the channel is in an OFF state, and no data can be transmitted over that link. A channel state of 1 corresponds to an ON channel, which can support a unit throughput (i.e. one packet transmission per slot). We are interested in a link activation policy which maximizes the average sum-rate in the network. p 1−p OFF ON 1−q q Figure 4-2: Markov Chain describing the channel state evolution of each independent channel. 4.1.1 Delayed Channel State Information An efficient link activation depends on the current state of each channel. Assume that every node has CSI pertaining to each link in the network; however, this information is delayed by an amount of time proportional to the distance between the node and the link in question. Specifically, a node n has k-step delayed CSI of links in Nk+1 (n), 122 where Nk (n) is the set of links that are k hops away from n. In other words, each node has current CSI pertaining to its adjacent links, 1-hop delayed CSI of its 2-hop neighboring links, and so on, as shown in Figure 4-3. This results in each node having progressively less accurate information of more distant links, modeling the effect of propagation and transmission delays on the process of collecting CSI. Figure 4-3: Delayed CSI structure for centralized scheduling. Controller (denoted by crown) has full CSI of red bold links, one-hop delayed CSI of green dashed links, and two-hop delayed CSI of blue dotted links. 4.1.2 Scheduling Disciplines We compare scheduling disciplines based on which nodes make transmission decisions, and what CSI is available. In particular, we compare policies which are centralized, where one controller uses delayed CSI to make a decision for the entire network, and policies which are distributed, where multiple controllers make decisions based only on current local information. Centralized Scheduling A centralized scheduling algorithm consists of a single entity making a global scheduling decision for the entire network. In this work, one node is appointed to be the centralized decision-maker, referred to as the controller. The controller has delayed CSI of each link, where the delay is relative to that links distance from the controller, and makes a scheduling decision based on the delayed CSI. This decision is then 123 broadcasted across the network. We assume the centralized controller broadcasts the chosen schedule to the other nodes in the network instantaneously. In practice, the decision takes a similar amount of time to propagate from the controller as the time required to gather CSI, which effectively doubles the impact of delay in the CSI. Therefore, the theoretical performance of the centralized scheduling algorithm derived in this work is an upper bound on the performance achievable in practice. Let Sl (t) be the state of the channel associated with link l at time t, and let dr (l) be the distance (in hops) of that link from the controller r. The controller has an estimate of this state based on the delayed CSI. Define the belief of a channel to be the probability that a channel is ON given the available CSI at the controller. For link l, the belief xl (t) is given by xl (t) = P Sl (t) = 1Sl (t − dr (l)) . (4.1) The belief is derived from the k-step transition probabilities of the Markov chain in Figure 4-2. Namely, P S(t) = j S(t − k) = i = pkij , (4.2) where pkij is computed as q + p(1 − p − q)k k p − p(1 − p − q)k , p01 = p+q p+q k q − q(1 − p − q) k p + q(1 − p − q)k = , p11 = . p+q p+q pk00 = pk10 (4.3) Throughout this work, assume that 1 − q ≥ p, corresponding to “positive memory,” i.e., an ON channel is more likely to remain ON than turn OFF. Consequently, the k-step transition probabilities satisfy the following inequality: 0 ≤ pi01 ≤ pj01 ≤ pk11 ≤ pl11 ≤ 1 ∀i ≤ j ∀l ≤ k (4.4) As the CSI of a channel grows stale, the probability that the channel is ON is given 124 by the stationary distribution of the chain in Figure 4-2, and denoted as π. lim pk01 = lim pk11 = π = k→∞ k→∞ p . p+q (4.5) Since the objective is to maximize the expected sum-rate throughput, the optimal scheduling decision at each time slot is given by the maximum likelihood (ML) rule, which is to activate the links that are most likely to be ON, i.e. the links with the highest belief. Under the primary interference constraint, a set of links can only be scheduled simultaneously if that set forms a matching. Let M be the set of all matchings in the network. The maximum expected sum-rate is formulated as X Sl (t){Sl (t − dr (l))}l∈L max E m∈M (4.6) l∈m = max m∈M = max m∈M X E Sl (t)Sl (t − dr (l)) (4.7) l∈m X xl (t). (4.8) l∈m Thus, the optimal schedule is a maximum weighted matching, where the weight of each link is equal to the belief at the current time. Distributed Scheduling A distributed scheduling algorithm consists of multiple entities making independent decisions without explicit coordination. In this work, each node makes a transmission decision for its neighboring links using only local information (CSI of adjacent links), which is readily available at each node, resulting in performance that is unaffected by the delay in CSI. The drawback of such policies is that transmission decisions made using local information may not be globally optimal. In order to avoid collisions, we consider distributed policies in which decisions are made sequentially. As a consequence, in addition to having local information, each node observes the actions of neighboring nodes in a manner similar to collision 125 avoidance in a CSMA-CA system. If a node begins transmission, neighboring nodes detect this transmission and activate a non-conflicting link rather than an interfering link. This allows us to focus on the sub-optimality resulting from making a local instead of a global decision, rather than the transmission coordination needed to avoid collisions. Moreover, alternative transmission coordination schemes are also possible based on RTS/CTS exchanges [42]. To determine the sequence in which decisions are made, priorities are assigned to each node off-line, and transmissions are made in order of node priority2 . n 1 0 0 n 1 1 1 1 0 (a) Suboptimal Matching 0 1 1 1 (b) Optimal Matching Figure 4-4: Example network: All links are labeled by their channel state at the current time. Bold links represent activated links. While distributed algorithms are designed to avoid collisions between neighboring transmitters, the restriction to using only local information results in distributed algorithms suffering from “missed opportunities”. In graph-theoretic language, the distributed scheduler returns a maximal matching, or a local optimum, rather than a maximum matching, or a global optimum. For example, in Figure 4-4, node n can choose to schedule either of its neighboring links; if it schedules its right child link, then the total sum rate of the resulting schedule is 1, as in Figure 4-4a, whereas scheduling the left link results in a sum rate of 2, as in Figure 4-4b. In a distributed framework, node n is unaware of the state of the rest of the network, so it makes an arbitrary decision resulting in a throughput loss. If node n was a centralized controller (with perfect CSI), it would always return the schedule in Figure 4-4b. Moreover, the loss in efficiency due to suboptimal decisions becomes more pronounced when moving 2 Here we assume a small propagation delay, such that nodes can immediately detect if a neighbor is transmitting. 126 beyond the simple two-state channel model. Partially Distributed Scheduling A third class of scheduling algorithms is those that combine elements of centralized and distributed scheduling. These algorithms are referred to as partially distributed scheduling algorithms. A partially distributed approach divides the network into multiple control regions, and assigns a controller to schedule the links in each region. Each controller has delayed CSI pertaining to the links in its control region, and no CSI pertaining to links in other regions. This allows for scheduling with fresher information than a purely centralized scheme, and less local sub-optimality than a fully distributed scheme. These policies are explored in Section 4.6. 4.2 Centralized vs. Distributed Scheduling In the previous section, we introduced two primary classes of scheduling policies: distributed and centralized policies. It is known that a centralized scheme using perfect CSI outperforms distributed schemes, due to the aforementioned loss of efficiency in localized decisions. However, these results ignore the effect of delays in CSI. In this section, we revisit the comparison between centralized and distributed scheduling, taking into account the delay in collecting CSI. We show that for sufficiently large CSI delays, distributed policies perform at least as well as the optimal centralized policy. 2 1 3 4 Figure 4-5: Four-node ring topology. 127 As an example, consider the four node network in Figure 4-5, and a symmetric channel state model satisfying p = q. Without loss of generality, assume node 1 is the controller. In a centralized scheduling scheme, node 1 chooses a schedule based on current CSI for links (1, 2) and (1, 4), and 1-hop delayed CSI for links (2, 3) and (3, 4). The resulting expected throughput is computed by conditioning on the state of each link. C(p) = 14 ( 34 (1 − p) + 41 p) + 12 · (1 + 21 ) + 14 (1 + 43 (1 − p) + 14 p) = 11 8 − 14 p. (4.9) Now consider a distributed schedule, in which node 1 makes a scheduling decision based on the state of adjacent links (1, 2) and (1, 4). After this decision is made, node 3 makes a non-conflicting decision based on the state of links (3, 1) and (3, 4). The resulting expected throughput is given by D= 1 3 3 21 · + · (1 + 21 ) = . 4 4 4 16 (4.10) The expected throughput for centralized and distributed scheduling in (4.9) and (4.10) is plotted in Figure 4-6. As the channel transition probability p increases, the memory in the channel decreases, and the expected throughput of a centralized scheduler decreases. The distributed scheduler, on the other hand, is unaffected by the channel transition probability, as it only uses non-delayed local CSI. For channel transition probabilities p ≥ 14 , distributed scheduling outperforms centralized scheduling over this network. The throughput degradation of the centralized scheme is a function of the memory in the channel state process. Let µ be a metric reflecting this memory. In the case of a two-state Markov chain, we define µ , 1 − p − q. (4.11) Note that µ is the second eigenvalue of the transition matrix for the two-state Markov chain, and thus represents the rate at which the chain converges to its steady state 128 Figure 4-6: Expected sum-rate throughput for centralized and distributed scheduling algorithms over four-node ring topology, as a function of channel transition probability p. distribution [21]. Theorem 17. For a fixed steady-state probability π, there exists a threshold µ∗ such that if µ ≤ µ∗ , there exists a distributed scheduling policy that outperforms the optimal centralized scheduling policy. In order to prove Theorem 17, we present several intermediate results pertaining to the expected sum-rate throughput of both the distributed and centralized schemes. Lemma 11. For a fixed steady-state probability π, and state transition probabilities p and q = π p, 1−π the expected sum-rate of any distributed policy is independent of the channel memory µ. Proof. This follows from the definition of a distributed policy in Section 4.1.2. Since distributed policies are restricted to only use CSI of neighboring links, which is available to each node without delay, the values of p and q do not affect the sum-rate. The expected sum-rate of a distributed policy only depends on the steady-state probability that links are ON. For fixed π, the expected sum-rate of the distributed policy is constant. Lemma 12. The expected sum-rate of the optimal centralized policy is greater than or equal to that of any distributed policy when µ = 1. Proof. When µ = 1, there is full memory in the channel state process, i.e. p = 0, and q = 0. In this case, the centralized policy has perfect CSI throughout the 129 network, and activates the sum-rate maximizing schedule, representing a globally optimal solution. Lemma 13. There exists a distributed policy with sum rate greater than or equal to the sum rate of the optimal centralized policy when µ = 0. Proof. If µ = 0, then the channel transition probabilities p and q satisfy p = 1 − q. In this scenario, there is no memory in the channel state process, and delayed CSI is useless in predicting the current channel state. To see this, consider the conditional probability of a channel state given the previous channel state. P(S(t + 1) = 1|S(t) = 0) = p = 1 − q = P(S(t + 1) = 1|S(t) = 1) (4.12) P(S(t + 1) = 0|S(t) = 0) = 1 − p = q = P(S(t + 1) = 0|S(t) = 1) (4.13) Thus, when µ = 0, the channel state process is IID over time. Let G be the graph representing the topology of the network with the controller labeled as node 0. Let N0 be the set of neighbors of node 0, and ∆ be the degree of node 0, i.e. ∆ = |N0 |. Let G0 ⊂ G be the graph obtained by removing the links adjacent to the controller from the network. Similarly, let Gi ⊂ G0 be the graph obtained by removing the links adjacent to node i from G0 . Recall, a matching M of a graph G is any subset of the edges of G such that no two edges share a node. Let M0 be a maximum (cardinality) matching over G0 , and Mi be a maximum cardinality matching over Gi . Due to the IID channel process, each link adjacent to the controller either has belief 0 or 1, and each non-adjacent link has belief π. Thus, the optimal centralized scheduler observes the state of its adjacent links and chooses a maximum throughput link activation. There are 2∆ possible state combinations observed by the controller; however, due to the fact that the controller can only activate one adjacent link, the optimal centralized schedule is one of at most ∆ + 1 matchings. Without loss of generality, when the controller does not activate an adjacent link, it activates matching M0 , and if the controller activates link (0, i) for i ∈ N0 , then it also activates matching Mi . 130 Lemma 13 is proved by constructing a distributed policy which activates the same links as the optimal centralized schedule. The ∆ + 1 potential activations can be computed off-line3 , and we assume each node knows the set of possible activations. Each node must determine which activation to use in a distributed manner. To accomplish this, node 0 activates the same adjacent link as in the centralized scheme, which is feasible since the centralized controller uses only local CSI when µ = 0. Every other node n activates links according to the matching M0 , unless that activation interferes with a neighboring activation. If a conflict occurs, then node 0 must have transmitted according to some other Mi for i ∈ N0 , and node n detects this conflict, and activates links according to the appropriate Mi . The remainder of the proof explains the details of this distributed algorithm. i Path Pi 0 Figure 4-7: Example of combining matchings to generate components. Red links and blue links correspond to maximum cardinality matchings M0 and Mi . The component containing node i is referred to as path Pi . Consider the graph composed of the nodes in G and the edges in both M0 and Mi , as done in [47], labeling edges in M0 as red and edges in Mi as blue. An example is shown in Figure 4-7. The resulting graph consists of multiple connected components, where each component is either a path or a cycle alternating between red and blue links. Note that every component not containing node i has the same number of red and blue links, since both matchings have maximum cardinality. Consider the component including node i, which must be a path since no blue links can be adjacent 3 To compute the set of potential activations, consider the case where only one link adjacent to the controller is ON, as well as the case where all adjacent links are OFF. 131 to node i. Denote this path as Pi . If node 0 schedules link (0, i), then nodes in path Pi must schedule blue links instead of red links. Since each node detects neighboring transmissions, this can be accomplished in a distributed manner. In all other components, either red links or blue links can be scheduled to obtain maximum throughput, because each component has equal red and blue links, and switching between red and blue links will not affect any other components. x i Pj Pi j Pj x i y j Pi n n (a) Scenario 1. A conflict detected from neighbor x corresponds to matching Mi , and a conflict detected from neighbor y corresponds to matching Mj . (b) Scenario 2. Node n can activate either according to Mi or Mj if a conflict is detected at neighbor x. Figure 4-8: Abstract representation of a node n’s position on multiple conflicting paths. The remaining detail concerns the decision of which of the ∆ alternate matchings to use if M0 conflicts with a neighboring transmission. As explained above, node n is informed of the switch to matching Mi by blue links being activated on path Pi , propagating from node i. If node n does not lie on any path Pi for i ∈ N0 , then activating links according to matching M0 never conflicts with any other transmissions at node n. If node n lies on a single path Pi , then upon detecting a conflicting transmission, node n switches to matching Mi . If there are i, j ∈ N0 , such that n ∈ Pi and n ∈ Pj , then node n decides between Mi and Mj based on the direction (neighbor) from which the conflicting transmission is detected, as illustrated in Figure 4-8a. If Pi and Pj are such that the conflicting link at node n is detected from the same neighbor, as in Figure 4-8b, then either Mi and Mj can be used. Lemma 14. Let C(p, q) be the sum-rate of the optimal centralized algorithm as a function of the channel transition probabilities p and q. For a fixed value of π, C(p, q) is monotonically increasing in µ = 1 − p − q. 132 Proof. Let Φ represent the set of feasible schedules (matchings), and φ ∈ Φ be a binary vector, such that φl indicates whether link l is activated in the schedule. Consider two channel-state distributions, one with transition probabilities p1 and q1 , and the other with probabilities p2 and q2 , satisfying π1 = π2 = π. Furthermore, assume that µ1 ≥ µ2 . Let aks,1 (bks,1 ) represent the k-step transition probability from s to 1 when the one-step transition probabilities are p1 and q1 (p2 and q2 ). Lastly, let dr (l) be the distance of link l from controller r, and let S(t − dr ) = Sl (t − dr (l)) l∈L be the delayed CSI vector, where the lth element is the delayed CSI of link l with delay equal to dr (l) slots. Let φ1 (s) and φ2 (s) be binary vectors representing the optimal schedules for state s, when the state transition probability is (p1 , q1 ) and (p2 , q2 ) respectively, with an arbitrary rule for breaking ties, i.e. φ1 (s) = arg max φ∈Φ φ2 (s) = arg max φ∈Φ X d (l) (4.14) d (l) (4.15) φl aslr,1 l∈L X φl bslr,1 . l∈L The expected sum-rate of the centralized scheme is expressed as C(p1 , q1 ) = X P(S(t − dr ) = s) s∈S C(p2 , q2 ) = X X d (l) (4.16) d (l) (4.17) φ1l (s)aslr,1 l∈L P(S(t − dr ) = s) s∈S X φ2l (s)bslr,1 . l∈L To prove the monotonicity of C(p, q), we show that for all p1 , q1 , p2 , q2 satisfying π1 = π2 and µ1 ≥ µ2 , C(p1 , q1 ) − C(p2 , q2 ) ≥ 0. (4.18) The above difference is bounded as follows. C(p1 , q1 ) − C(p2 , q2 ) = X P(S(t − dr ) = s) s∈S X l∈L 133 d (l) φ1l (s)aslr,1 − X P(S(t − dr ) = s) X s∈S ≥ X d (l) φ2l (s)bslr,1 (4.19) l∈L P(S(t − dr ) = s) s∈S X dr (l) dr (l) 2 φl (s) asl ,1 − bsl ,1 (4.20) l∈L where the inequality follows from the fact that φ2 is the maximizing schedule for channel 2, and not channel 1. The proof follows by partitioning the state space into sets of states for which every state in a sett yields the same optimal schedule. Let Sφ ⊂ S be the set of states such that φ is the optimal schedule, i.e., Sφ = s ∈ S|φ2 (s) = φ . (4.21) Due to the arbitrary tie-breaking rule in the optimization of φ2 (s) in (4.15), each s S belongs to exactly one Sφ . In other words, the sets {Sφi }i are disjoint, and φ∈Φ Sφ = S. Therefore, (4.20) can be rewritten as C(p1 , q1 ) − C(p2 , q2 ) ≥ XX P(S(t − dr ) = s) φ∈Φ s∈Sφ d (l) X φl d (l) aslr,1 − d (l) bslr,1 . (4.22) l∈L d (l) The quantity aslr,1 − bslr,1 simplifies using (4.3) and µi = 1 − pi − qi . d (l) d (l) d (l) d (l) aslr,1 − bslr,1 = π + (sl − π)µ1r − π − (sl − π)µ2r d (l) d (l) = (sl − π) µ1r − µ2r (4.23) (4.24) Combining (4.22) and (4.24) yields C(p1 , q1 ) − C(p2 , q2 ) ≥ XX P(S(t − dr ) = s) φ∈Φ s∈Sφ X d (l) d (l) φl · (sl − π) µ1r − µ2r l∈L (4.25) = XXX φ∈Φ s∈Sφ l∈L φl Y d (l) d (l) P(Sj (t − dr (j)) = sj ) (sl − π) µ1r − µ2r j∈L 134 (4.26) = XX X φl π(1 − π)(−1) 1−sl d (l) µ1 r − Y d (l) µ2r φ∈Φ l∈L s∈Sφ P(Sj (t − dr (j)) = sj ) j∈L\l (4.27) where (4.26) follows from the independence of the channel state process across links, and (4.27) follows from: P(Sl (t − dr (l)) = sl )(sl − π) = (πsl + (1 − π)(1 − sl ))(sl − π) (4.28) = πsl (sl − π) + (1 − π)(1 − sl )(sl − π) (4.29) = (−1)1−sl π(1 − π) (4.30) We prove that for any schedule φ ∈ Φ and link l ∈ L, X 1−sl φl π(1 − π)(−1) dr (l) d (l) µ1 − µ2 r s∈Sφ Y P(Sj (t − dr (j)) = sj ) ≥ 0 (4.31) j∈L\l Fix a schedule φ ∈ Φ and link l ∈ L. The summand in (4.31) is non-zero only if φl = 1, i.e. the link l is in the schedule φ. The summand is negative if and only if sl = 0. Consider a delayed CSI vector s ∈ Sφ such that sl = 0, and the delayed CSI vector s obtained from changing the lth element of s to 1, i.e., sj = sj ∀j 6= l, sl = 1. Since s ∈ Sφ , it follows that s ∈ Sφ . This is because link l is scheduled under φ, and the throughput obtained by scheduling link l is strictly increased in moving from s to s, so the same schedule must remain optimal. Therefore, for every element s ∈ Sφ contributing a negative term to the summation in (4.31), there exists another state s ∈ Sφ contributing a positive term of equal magnitude, implying that the entire summation must be non-negative. Proof of Theorem 17. Let C(µ) be the expected sum-rate throughput of the optimal centralized algorithm as a function of the memory in the channel. This theorem is proved by showing that there exists a distributed policy with expected sum-rate D(π), such that the relationship between C(µ) and D(π) is similar to that in Figure 4-6 for 135 fixed π 4 . Since C(µ) is monotonically increasing in µ (Lemma 14), with C(1) ≥ D(π) (Lemma 12) , and C(0) ≤ D(π) (Lemma 13), and D(π) is constant over µ for fixed π (Lemma 11), then C(µ) must intersect D(π), and this intersection occurs at µ∗ for some 0 ≤ µ∗ ≤ 1. Theorem 17 proves the existence of a threshold µ∗ , such that for µ ≤ µ∗ , distributed scheduling outperforms the optimal centralized scheduler. The value of µ∗ depends on the topology, and in general, this threshold is difficult to compute. In some topologies, µ∗ is 0 or 1 implying that distributed scheduling is always optimal, or that distributed scheduling is only optimal if there is no memory in the channel. In the following sections, we characterize the value of µ∗ in tree networks (Section 4.3) and clique networks (Section 4.4), and show for large networks, µ∗ approaches 1. 4.3 Tree Topologies In this section, we characterize the expected throughput over networks with tree topologies. The acyclic nature of these graphs make them amenable to analysis. We focus on rooted trees, such that one node is the root and every other node has a depth equal to the distance to the root. Furthermore, for any node v, the nodes that are connected to v but have depth greater than v are referred to as children of v, and children of v are siblings of one another. If u is a child of v, then v is the parent of u. This familial nomenclature is standard in the graph-theoretic literature [26], and simplifies description of the algorithms over tree networks. A complete k-ary tree of depth n is a tree such that each node of depth less than n has k children, and the nodes at depth n are leaf nodes, i.e. they have no children. Additionally, this section focuses on symmetric channel models such that p = q to simplify the analysis, but the results are easily extended to asymmetric channels as well. 4 Figure 4-6 presents throughput as a decreasing function of p, where as in this theorem we have an increasing function of µ 136 4.3.1 Distributed Scheduling on Tree Networks Consider applying the distributed scheduling algorithm in Section 4.1.2 over a complete k-ary tree of depth n, where priorities are assigned in order of depth (lower depth has higher priority). The root node first makes a decision for its neighboring links. Then, the children of the root attempt to activate on of their child links, if this activation does not conflict with their parent’s decision. Consequently, the average sum rate can be written recursively. Let Dnk be the average sum rate of the distributed algorithm over a complete k-ary tree of depth n. To begin, consider the case of k = 2 (binary tree). Dn2 Dn2 0 2 Dn−1 1 ? 0 2 Dn−1 2 Dn−1 2 Dn−2 2 Dn−2 (b) When at least one link is ON, it is scheduled. Dotted links cannot be activated, so expected throughput is 2 2 . 1 + Dn−1 + 2Dn−2 (a) When both adjacent links are OFF, neither are scheduled. Ex2 . pected throughput is 2Dn−1 Figure 4-9: Recursive distributed scheduling over binary trees. Dn2 = 1 3 3 5 2 3 2 2 2 2 + (1 + Dn−1 + 2Dn−2 ) = + Dn−1 + Dn−2 · 2Dn−1 4 4 4 4 2 (4.32) Equation (4.32) follows from conditioning on the links adjacent to the root. The first term corresponds to the case where both links are OFF. In this case, neither is activated and the algorithm recurses over the subtrees rooted by the children, as in Figure 4-9a. If at least one link is ON, it is activated. In this case, that child cannot transmit, so control passes to the grand-children, as in Figure 4-9b. Solving the above 2 recursion for n ≥ 1 using D02 = D−1 = 0, yields 137 Dn2 9 =− 77 3 − 4 n + 6 3 · 2n − . 11 7 (4.33) The average sum-rate in (4.33) of the distributed scheduling algorithm is independent of the link transition probability p, as each node only uses the CSI of the neighboring links, which is available without delay. This follows from Lemma 11. Consider the asymptotic per-link throughput as the number of links grows large. An n-level binary tree has 2n+1 − 2 links. Using the expression in (4.33), and taking the limit as n grows large while dividing by the number of links, yields Dn2 3 = . n+1 n→∞ 2 −2 11 lim (4.34) Thus, the distributed priority algorithm achieves a throughput of at least 3 11 per link. A similar analysis is applied to a general full k-ary tree, and a recursive expression is written in the vein of (4.32). k k k Dnk = ( 21 )k · kDn−1 + 1 − ( 12 )k 1 + (k − 1)Dn−1 + kDn−2 (4.35) A closed-form expression is obtained by solving the above recursion. n 1 k n+1 k+1 k k 2 − 1 1 + k (2 − 1) − 2 (k + 1) + (1 + k)(2 − 1) − 1 − ( 2 ) k Dn = k2k + 2k − 1 (k − 1) 2k+1 − 1 (4.36) k To determine the asymptotic per-link throughput, we divide (4.36) by the number of links in a k-ary tree, lim kn+1 −1 k−1 − 1. Taking a limit as n grows large, Dnk n+1 −1 n→∞ k k−1 −1 (k − 1)Dnk 2k − 1 = . n→∞ k n+1 − k 2k − 1 + k · 2k = lim Since for large k, 2k >> 1, this limit is approximately equal to 1 . k+1 (4.37) Intuitively, each node can only activate one neighboring link, and each node has k + 1 neighbors. 138 4.3.2 On Distributed Optimality In the above analysis of distributed scheduling over tree networks, it was assumed that priorities are assigned such that nodes closer to the root have higher priority. Interestingly, for tree networks, there exists an ordering of priorities such that the distributed policy is optimal, i.e. returns a schedule of maximum weight, and therefore always performs at least as well as the optimal centralized scheduler. Theorem 18. There exists an optimal distributed algorithm on tree networks that obtains an expected sum-rate equal to that of a centralized scheduler with perfect information. l l (b) Augmented matching including l. (a) Maximum matching 1. Figure 4-10: Example Matchings. If link l is required to be in the matching, there exists a new maximal matching including l. Proof. Consider the policy that gives priority to the leaves of the network. If a link adjacent to a leaf is ON, without loss of generality, there exists a maximum matching containing that link. Assume the optimal matching did not include this ON link. A new matching is constructed by adding the leaf link, and removing the link which interferes with it, as illustrated in Figure 4-10. Since the new link is adjacent to a leaf, at most one interferer exists in the matching. Thus, the augmented matching is also optimal. Therefore, it is always optimal to include an ON leaf link in the optimal matching. The links interfering with that leaf cannot be activated, and the algorithm recurses. In conclusion, assigning priorities in order of highest depth to lowest depth results in a maximum matching. While Theorem 18 shows that there exists an optimal priority assignment, it 139 does not hold for general topologies. Thus, we use the results in Section 4.3.1 to compare the cost of suboptimal local decisions to the cost of scheduling with delayed information. 4.3.3 Centralized Scheduling on Tree Topologies The optimal centralized policy schedules a maximum weight matching over the network, where the weight of each link is the belief given the delayed CSI. For tree networks, the maximum-weight matching is the solution to a dynamic programming (DP) problem. Consider a node v ∈ N . Let g1 (v) be the maximum weight matching of the subtree rooted at v, assuming that v activates one of its child links. Let g2 (v) be the maximum weight matching of the subtree rooted at v assuming that v cannot activate a child link, due to interference from the parent of v. Let r ∈ N be the controller (root of the tree), and dr (v) be the distance of node v from r. Let child(v) be the set of children to node v. Assume the controller has delayed CSI of each link (u, v) equal to s(u, v). The DP formulation for the weight of the optimal max-weight matching g ∗ (v) is given by g ∗ (v) = max(g1 (v), g2 (v)) X g2 (v) = g ∗ (u) (4.38) (4.39) u∈child(v) g1 (v) = dr (v) max ps(u,v),1 + g2 (u) + u∈child(v) X ∗ g (n) n∈child(v)\u d (v) r = g2 (v) + max (ps(u,v),1 + g2 (u) − g ∗ (u)) (4.40) u∈child(v) While (4.38) - (4.40) give the optimal centralized schedule for a specific observation of delayed CSI, computing the average sum rate requires taking an expectation over the delayed CSI. For smaller trees, of depth 2, a closed-form expression for the average sum-rate is given in Section 4.3.3. For larger trees, this analysis becomes difficult; thus, bounds on the expected solution to the DP are derived in Section 4.3.3. 140 Let Cnk be the average sum rate of the centralized algorithm over a full k-ary tree of depth n, when the root node is chosen to be the controller. Hence, the root node makes a decision for each link in the network based off of delayed CSI, where delays are proportional to depth. Sum-Rate Analysis for Trees of Depth 2 The centralized scheduling algorithm does not yield a simple recursive expression for sum-rate throughput, as in the distributed case; however, the centralized sum-rate is analytically characterized for simple trees. For a binary tree of depth 1, the centralized scheduling algorithm and the distributed scheduling algorithm are equivalent, since in both cases, decisions are made with full CSI. Therefore, the sum rate is 34 . Now consider a binary tree of depth 2. The expected sum rate is computed by conditioning on the channel state of the links adjacent to the controller. If both links to the root are OFF, neither will be scheduled. If only one adjacent link is ON, it will always be scheduled. If both links are ON, then the controller must use the state of the links at depth 2 to determine which adjacent link to schedule. C22 = 1 ·2 4 3 (1 4 1 1 − p) + 41 p + 2 (1 + 34 (1 − p) + 14 p) + 1 + 4 4 15 (1 16 − p) + 1 p 16 (4.41) = 3 3 1 1 + (1 − p) + p + 4 4 4 4 15 (1 16 − p) + 1 p 16 (4.42) Unlike the distributed case, the performance of the centralized scheduler is clearly dependent on the link transition probability p. If the centralized scheduler has perfect CSI, i.e. p = 0, the sum rate is 111 , 64 and when p = 12 , the sum rate is 11 . 8 Thus, the presence of information leads to a 26% improvement in throughput. Since the average sum-rate decreases linearly in p, there exists a threshold p∗ , such that distributed scheduling outperforms centralized scheduling for p ≥ p∗ , as in Theorem 17. Evaluating (4.33) for n = 2 gives the expected throughput of the distributed policy, 141 27 . 16 Combining this with (4.59) gives the threshold p∗ : p∗ = 3 ≈ 0.065 46 (4.43) Recall that the amount of memory in the channel state process is µ = 1 − 2p. Small values of p imply that the controller has very good knowledge of the network state, and there is little penalty to using delayed CSI. On the other hand, as p becomes large, and the controller has stale information, it makes inaccurate decisions regarding the links on the second level of the tree. We now apply a similar analysis to a 2-level, k-ary tree by conditioning on the state of each of the root’s k neighboring links. C2k = ( 12 )k ·k 1− ( 12 )k (1 − p) + ( 12 )k p k X k 1 2 1 k 1 k + ( 2 ) 1 + (k − n) 1 − ( 2 ) (1 − p) + ( 2 ) p + 1 − ( 12 )k )n (n − 1)(1 − p) n n=1 n−1 X n 1 k m 1 k(n−m) + (1 − ( 2 ) ) ( 2 ) (m(1 − p) + (n − m − 1)p) (4.44) m m=0 = (1 − p)(k + 1) + k p(k − 1) − (1 − p)k − (1 − 2p) 1 − ( 12 )k+1 k 2 (4.45) Comparing this to the value of D2k in (4.36), we solve for the value of p∗ (k) such that for p ≥ p∗ (k), the distributed policy outperforms the centralized policy. k 1 − ( 12 )k + 2k − 2 − ( 12 )k p (k) = k 2k − k · 2k − 2k + 2 2 − ( 21 )k − 1 ∗ (4.46) The function p∗ (k) is plotted in Figure 4-11. As k increases and the tree becomes wider, the threshold beyond which distributed scheduling outperforms centralized scheduling decreases exponentially, implying that distributed scheduling performs comparatively better for larger networks. Intuitively, as the tree grows wider, the probability of a ”missed opportunity” scenario decreases. For large networks, the drawback of a distributed solution is reduced, while the drawback of a centralized 142 approach, namely the delay in CSI, remains constant. This is observed by the fact that as k → ∞, the throughput of the distributed policy approaches that of the centralized throughput with perfect information. Figure 4-11: Threshold value of p∗ (k) such that for p > p∗ (k), distributed scheduling outperforms centralized scheduling on 2-level, k-ary tree. An Upper Bound on the Sum-Rate of Centralized Scheduling In this section, the sum-rate of the centralized scheduler is upper bounded to provide a sufficient condition for the existence of a distributed algorithm which outperforms the optimal centralized algorithm. The upper bound is constructed by recursively bounding the throughput attainable over a subtree. Let Cnk (δ) be the expected sumrate of a complete, k-ary subtree of depth n, where the root of that subtree is a distance of δ hops from the controller. Thus, the CSI of a link at depth h in the subtree is delayed by δ + h − 1 time slots. Note, Cnk (0) = Cnk as defined in Section 4.3.3. To begin, consider the case of k = 2, i.e. the topology is a complete binary tree. For a binary tree rooted at node v, let cL and cR be the left and right children of v respectively. The expected sum-rate is bounded by enumerating the possible states of the links incident to the controller. Label the links adjacent to the root as a and 143 b. If both links a and b are OFF, as in Figure 4-12a, then the root schedules neither link, and instead schedules links over the two n − 1 depth subtrees. If only link a (link b) is ON, then link a (b) will be scheduled, and the links adjacent to that link cannot be scheduled, as in Figure 4-12b (Figure 4-12c). If both a and b are ON, then the controller chooses the maximum between the scenarios in Figure 4-12b and Figure 4-12c. Combining these cases leads to an expression for centralized throughput. 1 1 2 2 2 · 2Cn−1 (1) + 2 · (1 + Cn−1 (1) + 2Cn−2 (2)) 4 4 1 + 1 + E max g1 (cL ) + g2 (cR ), g2 (cL ) + g1 (cR ) 4 1 3 2 2 ≤ + Cn−1 (1) + Cn−2 (2) + E g1 (cL ) + g1 (cR ) 4 4 3 3 2 2 = + Cn−1 (1) + Cn−2 (2) 4 2 Cn2 = (4.47) (4.48) (4.49) where g1 (·) and g2 (·) are defined in (4.39) and (4.40). The bound in (4.48) follows from the fact that g1 (u) ≥ g2 (u) for any node u ∈ N . In order to get a recursive expression for Cn2 , we also need to bound Cn2 (δ). Let φl (s) be an indicator variable equal to 1 if and only if link l is activated in the optimal schedule when the delayed CSI of the network is given by s. Similarly, let φδl (s) be an indicator variable equal to 1 if and only if link l is activated in the optimal schedule when the CSI is further delayed by δ slots. Applying (4.16), the centralized sum rates are expressed as Cn2 (0) = X P(S(t − dr ) = s) s∈S Cn2 (δ) = X X P(S(t − dr ) = s) X l∈L s∈S P(S(t − dr ) = s) X l∈L 144 d (l)+δ φδl (s)pslr,1 Equation (4.51) is bounded in terms of (4.50): X (4.50) l∈L s∈S Cn2 (δ) = d (l) φl (s)pslr,1 , d (l)+δ φδl (s)pslr,1 , (4.51) 2 Cn 2 Cn b a a 2 Cn−1 (1) cL cR 2 (1) Cn−1 2 Cn−2 (2) 2 Cn−2 (2) b 2 Cn−1 (1) 2 Cn−2 (2) cL cR 2 Cn−2 (2) 2 Cn−2 (2) (a) Link a and link b are not activated. The expected throughput is computed by the maximum expected 2 . matching over the solid links, 2Cn−1 2 Cn−2 (2) 2 Cn−2 (2) b a cL 2 Cn−2 (2) 2 Cn−2 (2) (b) If link a is scheduled, the dashed links cannot be scheduled, and the solid links can. 2 Cn 2 Cn−1 (1) 2 Cn−1 (1) cR 2 Cn−2 (2) 2 Cn−1 (1) 2 Cn−2 (2) 2 Cn−2 (2) (c) When link b is scheduled, the dashed links cannot be schedules but the solid links can. Figure 4-12: Possible scheduling scenarios for centralized scheduler. 145 = (1 − 2p)δ X s∈S l∈L + X X P(S(t − dr ) = s) P(S(t − dr ) = s) s∈S X d (l) φδl (s)pslr,1 φδl (s)pδ0,1 (4.52) l∈L ≤ (1 − 2p)δ X s∈S l∈L + X X P(S(t − dr ) = s) P(S(t − dr ) = s) s∈S X d (l) φl (s)pslr,1 φδl (s)pδ0,1 (4.53) l∈L = (1 − 2p)δ Cn2 (0) + pδ0,1 E[Number of Links Activated] 1 δ δ 2 ≤ (1 − 2p) Cn (0) + p0,1 E[Number of Links] 3 1 ≤ (1 − 2p)δ Cn2 (0) + pδ0,1 E[Number of Links] + 1 3 1 = (1 − 2p)δ Cn2 (0) + pδ0,1 (2n+1 + 1) 3 (4.54) (4.55) (4.56) (4.57) j j i Equation (4.52) follows from using the identity pi+j s,1 = p0,1 + (1 − 2p) ps,1 . Equation (4.53) follows from the fact that φl (s) is the sum-rate maximizing schedule in Cn2 (0). The bound in (4.55) follows from noting that at most one third of the links can be simultaneously scheduled due to interference. Combining the bound in (4.57) with that in (4.49) yields a recursive expression from which the upper bound is computed. Cn2 ≤ 3 3 1 2 2 2 + (1 − 2p)1 Cn−1 + p(2n + 1) + (1 − 2p)2 Cn−2 + p(1 − p)(2n−1 + 1) (4.58) 4 2 2 3 Solving the recursion in (4.58) yields a closed-form upper bound on the expected sum-rate throughput achievable by a centralized scheduler. 1 (−146 p − 15) (p − 1/2)n 1 (−94 p + 135) (−4 p + 2)n 1 −8 p2 + 14 p + 9 + − 80 320 6 (4 p − 1) (2 p − 3) (2 p − 1)2 (2 p − 1)2 n n 2 2 1 160 p − 248 p − 36 (p − 1/2) 1 40 p − 102 p + 11 (−4 p + 2) 2n + (4.59) − + 30 60 3 (2 p − 3) (2 p − 1)2 (2 p − 1)2 (4 p − 1) Cn2 ≤ To interpret this bound, we compute the limiting ratio of the centralized throughput to the number of links in the tree (for p > 0). 146 1 Cn2 = n+1 n→∞ 2 −2 6 lim (4.60) Note that this value is independent of p. This is because as long as the controller does not have perfect knowledge (i.e. p > 0), as n grows large, infinitely many nodes are sufficiently far from the root such that the controller has no knowledge of their current state. One third of these links are scheduled (size of a maximum cardinality matching) and they will be in the ON state with probability 21 . Hence, the limiting per-link throughput is 61 . Recall from (4.34) that the per-link average sum-rate under distributed scheduling is 3 . 11 Therefore, as the network grows large, distributed scheduling eventually outperforms centralized scheduling, regardless of the memory in the channel state process. Additionally, the threshold p∗ (n) at which distributed scheduling outperforms centralized scheduling is bounded for a tree of depth n, by equating (4.33) and (4.59). Figure 4-13 shows this threshold as a function of n. Note that as n gets large, this threshold approaches zero, implying that distributed is always better than centralized in these cases, as expected from the asymptotic analysis. Figure 4-13: Threshold value of p∗ (n) such that for p > p∗ (n), distributed scheduling outperforms centralized scheduling on n-level, binary tree. The bound also extends to k-ary trees. The bound in (4.55) is adapted to k-ary 147 trees through the observation that a complete k-ary tree of depth n has kn+1 −k k−1 Cnk (δ) = (1 − 2p)δ Cnk (0) + pδ0,1 E[Number of Links Activated] 1 E[Number of Links]e k+1 1 ≤ (1 − 2p)δ Cnk (0) + pδ0,1 E[Number of Links] + 1 1 k + n+1 k −k δ k δ = (1 − 2p) Cn (0) + p0,1 +1 k2 − 1 ≤ (1 − 2p)δ Cnk (0) + pδ0,1 d links. (4.61) (4.62) (4.63) (4.64) The bound in (4.64) is used to compute a recursion from which the upper bound is derived. We bound Cnk using the same strategy as in (4.49). 1 k k Cnk ≤ ( 21 )k · kCn−1 (1) + k( 12 )k 1 + kCn−2 (2) + (k − 1)Cn−1 (1) k k X k 1 k + 1 + kCn−1 (1) n 2 n=2 k k = 1 − ( 21 )k + k 1 − ( 12 )k Cn−1 (2) (1) + k 2 ( 12 )k Cn−2 (4.65) (4.66) Combining the bound in (4.66) with that in (4.64), yields k k Cnk ≤ k 1 − ( 12 )k (1 − 2p)Cn−1 + 1 − ( 12 )k + k 2 ( 21 )k (1 − 2p)2 Cn−2 n kn − k k −k 2 1 k 1 k + 1 + k ( 2 ) 2p(1 − p) 2 +1 + k 1 − (2) p 2 k −1 k −1 (4.67) Inequality (4.67) can be solved to yield a closed form upper bound on the centralized sum-rate for large trees. 4.4 Clique Topologies In addition, we consider fully-connected mesh networks (i.e. clique topologies), in which each pair of nodes is connected. Compared to tree networks in Section 4.3, 148 mesh networks have a much smaller diameter, resulting in the centralized approach having access to fresher CSI. 4.4.1 Centralized Scheduling Consider a fully-connected network where the channel state at each link is independent and identically distributed according to the Markov chain in Figure 4-2. In this network, an arbitrary node is chosen as the controller; the choice of controller does not affect throughput due to the network symmetry. In an N -node mesh, the controller is connected to each other node, such that the controller has full information on N − 1 links, and one-hop delayed information for the other (n−1)(n−2) 2 links. The average sum-rate attainable by a centralized controller is upper bounded by assuming there exists a maximum cardinality matching consisting of ON links (links with belief greater than the steady state probability). The probability of this event occurring increases with the size of the network; consequently, this bound becomes tight as the network size increases. If the controller finds such a matching, the expected sum-rate is given by CnU B n−2 =1+ (1 − q), 2 (4.68) where q is the transition probability from ON to OFF, and b n−2 c is the size of the 2 maximum cardinality matching in the graph that remains after a link emanating from the controller has been included in the matching. 4.4.2 Distributed Scheduling Next, we apply the distributed scheme to a clique topology. The distributed algorithm operates as follows: a node transmits over a randomly chosen ON neighboring link, if one exists, and otherwise does not transmit. Then the next node repeats this process, only considering ON links which do not interfere with any previously scheduled links. The average achievable sum-rate of this algorithm is computed recursively as follows. The first node to transmit has a probability 1−(1−π)n−1 of having an adjacent 149 link in the ON state, where π is the steady state probability defined in (4.5). If there exists an ON link, the two nodes adjacent to that link cannot activate any other links, so the next node schedules over an n − 2 node clique. On the other hand, if no neighboring links are ON, then no links are activated, and the next node schedules over an n − 1 node clique. The sum-rate is lower bounded by assuming that the next node to transmit always schedules over an n − 2 node clique, regardless of whether or not an ON link was found. This technique restricts the space of potential matchings which can be activated, and thus results in a lower bound on expected throughput. Dn = (1 − (1 − π)n−1 )(1 + Dn−2 ) + (1 − π)n−1 Dn−1 ≥ (1 − (1 − π)n−1 ) + Dn−2 (4.69) (4.70) Equation (4.70) yields a recursion which is solved to lower bound the average sum-rate of the distributed priority scheduler. 1 π (1 − π)n+1 π(3 − 2π)(−1)n 1 2π 2 + π + 2 Dn ≥ π(−1)n + + − + (n + 1) − (4.71) 2 2 π(2 − π) 8 − 4π 2 4π In the case where p = q (π = 21 ), this equation simplifies to n 3 2 1 1 n n Dn ≥ (−1) − + + . 12 4 3 2 2 (4.72) As n increases, the expected fraction of nodes with an ON neighboring link tends to 1, implying that this bound is also asymptotically tight. 4.4.3 Comparison The bounds in (4.72) and (4.68) combine to give a bound on p∗ , the value of the transition probability (for a symmetric channel) after which there exists a distributed policy that performs at least as well as the optimal centralized policy. For n even, the bound is given by 150 p∗ ≤ 4 (1 3 − ( 12 )n ) . n−2 (4.73) Similarly, for odd values of n, combining (4.68) and (4.72) yields ∗ p ≤ 4 1 ( 3 2 − ( 12 )n ) . n−3 (4.74) Clearly, as n grows large, the distributed algorithm outperforms the optimal centralized scheduler for a wider range of channel transition probabilities p, since the upper bound goes to 0. 4.5 Simulation Results In this section, the performance of the distributed policy is compared to the performance of a centralized controller through simulation. For the centralized case, a controller is chosen (off-line), and a maximum weighted matching is computed over the network, where the weight of each link is equal to the belief of that link. This is compared to a distributed approach in which priorities are assigned in reverse order of degree. For each network, we simulate decisions over 100,000 time slots. Each simulation assumes a symmetric channel state model (p = q). To begin, consider the six node network in Figure 4-14, where the centralized controller is located at node zero. The average sum-rate throughput as a fraction of the perfect-CSI throughput5 is plotted as a function of the channel state transition probability p in Figure 4-15. In Figure 4-17, the simulation is applied to a five-byfive grid network of Figure 4-16, where the centralized controller is located at the central-most node. Lastly, the simulation is applied to a 10-node, fully connected mesh network in Figure 4-18. These results show that for p small, modeling channels with high degrees of memory, a purely centralized controller is optimal. As p increases, eventually the distributed scheme outperforms the centralized scheme in each case. In Figure 4-18, we 5 This is the throughput attainable by a centralized scheduler with perfect CSI. 151 1 2 0 5 3 4 Figure 4-14: A six-node sample network see that at p ≥ .16, the distributed algorithm outperforms the centralized algorithm. Recall the bound on p∗ found in (4.73) for cliques shows that the theoretical bound on p∗ is p∗ ≤ .1665. In this case, the theoretical bound agrees closely with the observed simulation results. Additionally, comparing the results for the 5x5 grid in Figure 4-17 with the clique in Figure 4-18, it is evident that the threshold p∗ is higher in the clique. This is because the information available to the centralized scheduler is lessdelayed in the clique than in the grid, where the diameter is larger. This illustrates the effect of the topology on the resulting performance of each scheduling approach. 6 Node Network Fraction of Perfect−CSI Throughput 1 0.98 0.96 Distributed 0.94 0.92 0.9 0.88 0.86 Centralized 0.84 0.82 0.8 0 0.05 0.1 0.15 0.2 0.25 p 0.3 0.35 0.4 0.45 0.5 Figure 4-15: Results for the six node network in Figure 4-14, over a horizon of 100,000 time slots. The plot shows the fraction of the perfect-CSI throughput obtained as a function of p, the transition probability of the channel state. 152 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Figure 4-16: A 5x5 grid network 5x5 Grid Network Fraction of Perfect−CSI Throughput 1 0.95 Distributed 0.9 0.85 0.8 0.75 Centralized 0.7 0.65 0 0.05 0.1 0.15 0.2 0.25 p 0.3 0.35 0.4 0.45 0.5 Figure 4-17: Results for a 5 x 5 grid network, over a horizon of 100,000 time slots. The plot shows the fraction of the perfect-CSI throughput obtained as a function of p, the transition probability of the channel state. 153 10 Node Mesh Fraction of Perfect−CSI Throughput 1 0.95 Distributed 0.9 0.85 0.8 0.75 Centralized 0.7 0.65 0 0.05 0.1 0.15 0.2 0.25 p 0.3 0.35 0.4 0.45 0.5 Figure 4-18: Results for 10-node clique topology, over a horizon of 100,000 time slots. The plot shows the fraction of the perfect-CSI throughput obtained as a function of p, the transition probability of the channel state. 4.6 Partially Distributed Scheduling Up to this point, this chapter has compared the performance of distributed scheduling with the performance of the optimal centralized schedule using delayed CSI. For large networks, the delayed CSI causes a reduction in the throughput of the centralized scheduler, as the links far from the controller have channel states largely independent of the available CSI at the controller. A distributed scheme is shown to outperform the centralized scheme in these scenarios; however, distributed policies suffer from the inability to compute a globally optimal schedule. An alternate to fully distributed scheduling is a partially distributed scheme, in which multiple controllers are used to schedule the links in local neighborhoods. In this section, we consider applying a partially distributed scheduling scheme to a binary tree, and show that this scheme outperforms both the fully centralized or distributed approaches. Consider an infinitely-deep binary tree. A single centralized controller has no information pertaining to the majority of the network, and at most attains an average per-link throughput of 1 , 6 as shown in (4.60). We have shown that a distributed scheme outperforms the centralized scheme in this scenario. Now, we consider a 154 partially distributed scheme to retain some of the benefits of centralized scheduling. (a) Subtree of depth 2 (b) Subtree of depth 3 Figure 4-19: Example subtrees from tree-partitioning algorithm B B B B B B B B B B B B B B B B B B B B B B B (a) Partitioning into subtrees of depth 1 B B B B B B B B B B B B B (b) Partitioning into subtrees of depth 2. Figure 4-20: Example partitioning of infinite tree (only first four level’s shown). Dashed links, dotted links, and solid links each belong to different subtrees. The solid nodes represent controllers, which are located at the root of each subtree. Nodes labeled with B are border nodes. The full binary tree is partitioned into subtrees of depth k, such that each non-leaf node in the subtree has degree 3. Subtrees of depth 2 and 3 are shown in Figure 4-19, and an example partitioning is shown in Figure 4-20. Observe that there exists a partitioning with subtrees of any depth. Each node in the original binary tree either belongs to one subtree, or three subtrees. Define a border node to be a node which belongs to three subtrees, as illustrated by the nodes labeled B in Figure 4-20. After the binary tree is partitioned, a controller is placed in each partition such that the resulting rooted subtree has the desired depth. Each controller computes a schedule for its partition, using delayed CSI pertaining to the links in the subtree. In order to eliminate inter-subtree interference, multiple controllers cannot activate links adjacent to the same border node simultaneously. Consider an algorithm which 155 L R U L U UU U U R L UU R UU U UU U Figure 4-21: Illustration of border link labeling scheme disables a set of links, such that a disabled link cannot be activated. We propose a link disabling algorithm with the result that different control regions cannot interfere with one another. Note that this link-disabling scheme is inspired by the work in [58]. Theorem 19. It is sufficient to disable one link per subtree to completely eliminate inter-subtree interference. Proof. To begin, note that inter-subtree interference only occurs at border nodes. Furthermore, each border node has degree three, and each link adjacent to the border node belongs to a different subtree. Based on the visualization of the tree in Figure 4-21, the three adjacent links at each border node are labeled as either U , L, or R, denoting whether the link is the upmost link, the left link, or the right link incident to the node. In each subtree, all leaves are border nodes, and a subtree of depth k will have 3 · 2k−1 leaves. Based on the partitioning scheme, one of the leaf links in each subtree is an L link or an R link, and the remainder of the leaf links will be U links, as illustrated in Figure 4-21. Consider the policy which disables all links labeled L or R. Each border node now has only one adjacent enabled link (the link labeled U ), and thus interference is removed between subtrees. Furthermore, since each subtree only has one L or one R leaf link, only one link is disabled per subtree. The above scheme for inter-subtree contention resolution disables one link per subtree, leading to a loss in throughput. As the size of the subtree grows, this 156 loss becomes negligible. Figure 4-22 shows the per-link throughput as a function of the state transition probability p for various subtree sizes. For small values of p, using subtrees of a larger depth yields higher throughput, as the delayed CSI is useful. As p increases and delayed CSI becomes less valuable, it becomes optimal to use less information and add more controllers. Note, a partitioning with subtrees of depth 1 is fully distributed in the sense of this paper, as controllers use only local information with which to make scheduling decisions. This plot illustrates a region in which partially distributed scheduling outperforms both fully centralized and fully distributed solutions. Intuitively, by dividing the network into control regions, centralized scheduling is used in each region, and distributed scheduling is used across regions, providing a trade-off between using delayed CSI and making local decisions. Figure 4-22: Per-link throughput of the tree partitioning scheme, plotted as a function of transition probability p for various subtree depths. 157 4.7 Conclusion In this chapter, we studied the effect of using delayed channel state information (CSI) on the throughput of wireless scheduling. We showed that a centralized scheduling approach, while optimal with perfect CSI, suffers from having delayed CSI. Consequently, we show that for rapidly-varying channels, distributed scheduling outperforms centralized scheduling. Similarly, as networks grow larger, distributed approaches become optimal. Since centralized policies are constrained to using delayed CSI, the location of the controller has an effect on the throughput performance of the scheduling algorithm. The choice of controller location corresponds to a choice of which information is accurate, and which information is delayed. Thus, controllers should be placed in locations that are central, in terms of both high degree nodes, and the hop-based center of the network so that more information is available with minimal delay. The problem of controller placement is addressed in Chapter 5. 158 Chapter 5 Controller Placement for Maximum Throughput In the previous chapter, we established that channel state information (CSI) delays are inherent to centralized wireless scheduling. In deploying such a scheme, one node is assigned the role of a controller, and collects CSI from the rest of the network. Then, the controller uses this CSI to select a set of nodes to transmit in each slot, in order to maximize throughput and avoid interference between neighboring links. As discussed in Chapter 4, CSI updates from distant links arrive at the controller after a delay that grows with the distance of the links from the controller. Since CSI delay reduces the throughput of the scheduling algorithm [69], the placement of the controller directly impacts network performance. The aim of this chapter is to study the impact of the controller placement on network performance. In Section 5.1, we analyze the static controller placement problem, in which the controller placement is computed off-line, and remains fixed over time. We provide an optimal formulation, which can be solved numerically for smaller networks, and a heuristic algorithm to compute a near-optimal controller placement for large networks. For a static controller placement, links near the controller achieve a high throughput, while links further away from the controller attain a lower throughput, due to the CSI delay at the controller. In order to mitigate this imbalance, the second half 159 p 1−p 0 1 1−q q Figure 5-1: Markov Chain describing the channel state evolution of each independent channel. State 0 corresponds to an OFF channel, while state 1 corresponds to an ON channel. of this chapter investigates dynamic controller placement schemes, which change the location of the controller over time. This allows for the controller to be moved to a region of the network with high backlogs to increase throughput to this region and provide stability. In Section 5.2, we propose a dynamic controller placement framework, where the controller is repositioned on-line. Since at any time, each node has a different view of the network state due to the distance-based CSI delays, the controller placement algorithm must only depend on information shared by all nodes, such that no additional communication overhead is required. First, we propose a queue-length based controller placement algorithm, and show that this algorithm offers an increased throughput over a static placement. We propose a joint controller placement and scheduling algorithm which is shown to be throughput optimal over the considered policy-space. Second, we consider policies which use delayed CSI as well as delayed queue length information (QLI), and find a throughput optimal policy over this policy-space, while characterizing the improvement obtained in using this extra CSI. 5.1 Static Controller Placement In this section, we consider an off-line controller placement, such that the controller remains fixed over time. We show that the optimal controller placement depends on the network topology as well as the channel transition probabilities. 160 5.1.1 System Model Consider a network G(N , L) consisting of sets of nodes N and links L. Each link is associated with an independent, time-varying channel, which is either ON state or OFF. Let Sl (t) ∈ {OFF, ON} be the channel state of the channel at link l at time t. Assume the channel state evolves over time according to the Markov chain in Figure 5-1. One of the nodes is assigned to be the controller, and in each time slot, activates a subset of links for transmission. Assume a primary interference constraint in which a link activation is feasible if the activation is a matching, i.e. no two neighboring links are activated. If link l is activated, and Sl (t) = ON, then a packet is successfully transmitted at that time slot. On the other hand, if the channel at link l is OFF, then the transmission fails. The objective of the controller is to activate the set of links resulting in maximum expected sum-rate throughput. In order to determine the correct subset of links to activate, the controller obtains channel state information (CSI) from each link in the network, and uses the CSI to compute a feasible link activation with maximum expected throughput. The scheduling problem for a fixed controller was presented in Chapter 4. Due to the physical distance between network nodes, and the propagation delay across each link, the CSI updates received at the controller are delayed proportional to the distance between each link and the controller. In particular, let let di (j) be the (symmetric) distance in hops between node i and node j. At time t, each node i has delayed CSI pertaining to node j from time-slot t − dj (i). In other words, node i has CSI Si (t − di (j)) for node j. 5.1.2 Controller Placement Example To begin, consider the example topology in Figure 5-2, and compare the expected throughput attainable by placing the controller at node A, node B, or node C. Placing the controller at node A yields the same expected throughput as placing the controller at node C, due to the symmetry of the network. Consider a generalization of the network in Figure 5-2, where A and C have degree k + 1. For simplicity, assume a 161 A B C Figure 5-2: Barbell Network symmetric Markov state in Figure 5-1, i.e. p = q. Let γ = ( 12 )k , the probability that k links are OFF. Placing the controller at node B results in an expected throughput of thptB = 1 1 · 2 (1 − γ)p111 + γp101 + 1 + (1 − γ)p111 + γp101 4 2 1 2 1 + 1 + (1 − γ )p11 + γ 2 p101 . 4 (5.1) The above expression follows from conditioning on the state of the two adjacent links to node B. The first term corresponds to the expected throughput when both links are OFF, the second corresponds to the case when one is ON and the other is OFF, and the last term corresponds to both links being ON. Similarly, the expected throughput from a controller at node A is derived by conditioning on the state of the k + 1 links adjacent to node A. 1 1 1 2 2 thptA = (1 − γ) 1 + p11 + (1 − γ)p11 + γp01 2 2 1 1 1 1 1 2 2 2 2 1 + (1 − γ)p11 + γp01 + p + (1 − γ)p11 + γp01 +γ 2 2 2 11 2 (5.2) Consider the throughput obtained from a controller at A and B in the limit as k grows to infinity, in which case γ = 0. lim thptA = 1 + 21 (1 − p) + 12 p211 k→∞ lim thptB = k→∞ 3 4 + 54 (1 − p). 162 (5.3) (5.4) For p ≤ 14 , it is optimal to place the controller at node B in the center, and for p ≥ 41 , it is optimal to place the controller at either node A or C. This example highlights some important properties of the controller placement problem. In particular, it is clear the optimal placement depends on the channel transition probabilities. When p is small, it is advantageous to place the controller to minimize the CSI delay throughout the network. On the other hand, when p is close to 21 , the CSI is no longer useful and it is better to maximize the degree of the controller, since the controller always has perfect information of its neighboring links. 5.1.3 Optimal Controller Placement From the previous example, it is clear that the throughput-maximizing controller placement is a function of the channel state transition probabilities p and q, as well as the network topology. In this section, we present a mathematical formulation for the optimal controller location. Let M be the set of matchings in the network, i.e., ∀M ∈ M, M is a set of links which can be scheduled simultaneously without interfering with one another. Under a throughput maximization objective, the selected controller schedules the matching that maximizes expected sum-rate throughput with respect to the CSI delays at that node. Consequently, the controller placement can be optimized as follows. c = arg max ES X max E Sl (t) Sl (t − dr (l)) = Sl M ∈M r = arg max ES max M ∈M r = arg max ES max r M ∈M (5.5) l∈M X d (i) pSrl ,1 (5.6) l∈M X dr (l) π + (Sl − π)(1 − p − q) (5.7) l∈M Equation (5.6) follows since the channel state satisfies Sl (t) ∈ {0, 1}. Equation (5.7) follows from using the definition of the k-step transition probability of the channel state Markov chain. Computing a maximum matching requires solving an integer 163 linear program (ILP) and it is known to be solvable in O(|L|3 )-time [55]. However, computing the optimal controller position in (5.7) requires computing the expectation of the maximum matching, which necessitates solving the ILP for every state sequence S(t) ∈ {0, 1}|L| . Thus, the computational complexity of the controller placement problem is exponential in the number of links, and this computation is intractable for large networks. 5.1.4 Effect of Controller Placement C k2 B k1 A Figure 5-3: (Snowflake Network) Symmetric network in which node A has degree k1 and node B has degree k2 + 1 Since the computation of the optimal controller placement is difficult, it is important to quantify the sensitivity of the expected throughput to the location of the controller. Consider the snowflake network in Figure 5-3. Due to the symmetry of the network, there are three potential controller locations, labeled as nodes A, B, and C. The optimal controller placement is computed by solving (5.7), and the corresponding expected sum-rate throughput attainable from a controller at each location is shown in Figure 5-4 for k1 = 4 and k2 = 20. Placing the controller at node A results in the maximal sum rate for the majority of channel transition probabilities, except when the transition probability is close to 21 , at which point, node B becomes the optimal location. Figure 5-4 shows that in some operating regimes, placing the controller at the wrong location results in significant reduction in expected throughput. In particular, 164 Figure 5-4: Sum-rate throughput resulting from having controller at three possible node locations, with k1 = 4 and k2 = 20, as a function of channel transition probability p = q. for a symmetric channel state model (p = q), if k2 = 2 ∗ k1 , and k1 grows large, the throughput attainable from a controller at node A is 1 + (k1 − 1)(1 − p) and the throughput attainable from a controller at node B is 1 + (1 − p) + (k1 − 2)p211 . Therefore, as k1 grows large, placing the controller at node A offers up to a 20% gain in throughput over placing the node at B, even though B has a higher degree. Furthermore, placing the controller at node A offers up to a 33% gain over placing the controller at node C. Clearly, computing the optimal controller location has a significant impact on the throughput performance of the network, and a simple largest-degree controller placement heuristic is insufficient. Note that placing the controller at node B, the high-degree node, is optimal when p approaches 1 2 and the CSI becomes useless for all links but those adjacent to the controller. 5.1.5 Controller Placement Heuristic In Section 5.1.3, a mathematical formulation for computing the optimal controller location in a network was presented, which depends on the distance between each node, as well as the channel state statistics. However, this computation has a complexity that grows exponentially with the size of the network. Section 5.1.4 shows 165 that an accurate controller placement heuristic is required to prevent a significant loss in throughput. In this section, we propose a computationally tractable heuristic for computing the optimal controller location, which is shown to be near-optimal in terms of the resulting expected throughput. Consider the following heuristic for placing the controller. Each node is assigned a weight based on its degree. As the memory in the channel process decreases, the best controller location is the node most likely to have an ON neighboring link, i.e. the node with the highest degree. To model this, node n is assigned a weight of (1 − (1 − π)∆n ), where ∆n is the degree of node n, which is equal to the probability of having an adjacent ON link. The controller is placed at the location maximizing the information about the network. Intuitively, the controller should be “close” to as many highly weighted nodes as possible. However, “closeness” must reflect the memory in the system. Thus, each node computes a function of the distance to each other node, (1 − p − q)di (n) , and maximizes the weighted sum-distance to each node as shown in (5.8). In summary, the controller is placed according to: c = arg max r X (1 − p − q)dr (n) (1 − (1 − π)∆n ). (5.8) n∈N Placing the controller according to (5.8) preserves the important properties of the optimal controller placement in (5.7). The heuristic in (5.8) is very similar to the well-known p-median problem [16], for p = 1. The 1-median problem seeks to find the node that minimizes the sum distance to all other nodes. In contrast, the controller placement assigns weights to nodes and uses a convex function of distance in this computation. These differences ensure that the controller is placed at the location that yields high throughput, which may not be the same as the solution to the 1-median problem. Consider the barbell network in Figure 5-2. Figure 5-5 shows the expected throughput for a controller at node A and a controller at node B, as well as the value of the heuristic objective in (5.8). These results show the controller placement 166 (a) Expected Throughput (b) Heuristic Weight Figure 5-5: Evaluation of the controller placement heuristic for the barbell network and various channel transition probabilities p = q. in (5.8) is similar in terms of throughput to the optimal placement. When the heuristic offers a different controller placement, the difference from the throughput obtained from the optimal placement is small. In general, the heuristic returns a controller location that yields throughput close to that of the optimal placement. Consider the NSFNET topology in Figure 56. For this topology, the heuristic of (5.8) is applied and compared to the optimal controller placement, as shown in Table 5.1. Often, the heuristic-optimal controller placement is the same as the throughput-optimal controller placement. Furthermore, in instances where the throughput-optimal location differs from the heuristic location, the controller is placed at a location yielding an average throughput within 1% of optimal. 5.1.6 Multiple Controllers In Section 4.6, a partially distributed scheduling scheme, in which the network is partitioned into sub-networks and multiple controllers are used to control each partition independently, is shown to outperform both fully centralized and distributed scheduling in certain operating regimes. Formulating the optimal k-controller place167 1 11 8 13 0 3 4 10 5 7 12 2 9 6 Figure 5-6: 14 Node NSFNET backbone network (1991) Strategy p = 0.05 p = 0.1 p = 0.15 p = 0.2 p = 0.25 p = 0.3 p = 0.35 p = 0.4 p = 0.45 Optimal Placement Heuristic Placement % Error 6 6 0 6 6 0 6 6 0 6 6 0 6 6 0 6 6 0 10 6 .0289 10 6 .2974 10 6 .5704 Table 5.1: Results of controller placement problem over the NSFNET topology. Optimal placement is computed by solving (5.7) via brute force, while heuristic refers to (5.8). ment problem is difficult due to the necessity of resolving conflicts on the boundary of the control regions. Despite this challenge, the heuristic in Section 5.1.5 can be extended to multiple controllers. This extension is analogous to the extension of the 1-median problem to the p-median problem. Let r = (r1 , . . . , rk ) ∈ N k be a vector of locations for the k controllers. The k-controller placement heuristic is formulated as c = arg max r∈N k X (1 − p − q)mini dri (n) (1 − (1 − π)∆n ), (5.9) n∈N The k-controller heuristic is similar to the 1-controller heuristic in (5.8), with the modification that nodes are weighted by a function of the distance to the closest 168 controller. Assigning each node to the closest controller maximizes the expression in (5.9), and yields the highest expected throughput since the controller closest to a link has the most accurate CSI pertaining to that link. The optimization in (5.9) involves iterating through each combination of k nodes, the complexity of which grows as Nk . Therefore, we propose a low-complexity heuristic to place the k controllers. To begin, consider the Myopic Controller Placement algorithm, which places each controller sequentially, assuming the previously placed controllers have been placed optimally. Algorithm 1 Myopic Controller Placement 1: 2: 3: Given C0 = {}; for j = 1 → k do cj = arg max r∈N 4: 5: Cj = Cj−1 end for S X (1 − p − q) mini∈Cj−1 S{r} dri (n) (1 − (1 − π)∆n ), (5.10) n∈N {cj }; At each iteration, the myopic controller placement algorithm finds the location for a new controller such that each node is controlled by either a controller in Cj−1 , or the new controller. After executing the myopic controller placement algorithm, Ck is a feasible location of controllers, but is potentially suboptimal. To improve the quality of this solution, the controller exchange algorithm is used to refine the solution. A similar algorithm is used as a heuristic approximation to the p-median problem in [64]. The controller exchange algorithm refines the selection of the controllers by selecting an element r ∈ Ck at random, and searching to see if there exists a node to replace r as a controller, that results in a higher throughput. The controller exchange algorithm circumvents the local optima resulting from the myopic placement algorithm. To verify the performance of these heuristics, each algorithm is run over various random geometric graphs. A random geometric graph (RGG) with N nodes and connectivity radius R is a random graph in which N nodes are randomly placed in 169 Algorithm 2 Controller Exchange Algorithm Input: Ck : A set of k controller locations; 1: while 1 do 2: C0 = Ck 3: Generate random partition x of Ck 4: for r ∈ x do 5: C 0 = Ck \ r 6: X S c = arg max (1 − p − q)mini∈C 0 {r} dri (n) (1 − (1 − π)∆n ), r∈N 7: 8: 9: 10: 11: 12: 13: 14: 15: (5.11) n∈N if c 6= r then S Ck = C 0 {r} Break; end if end for if Ck = C0 then Break; end if end while the unit square, and two points are connected if the Euclidean distance between them is less than R [52]. Numerous random graphs of 20, 30, and 40 nodes are generated, the myopic placement policy and the controller exchange policy are applied to these RGG’s, and each of these algorithms is compared with the solution obtained in (5.9). The results of this experiment are presented in Table 5.2. The myopic policy is shown to return a weight close to that of the optimal solution, and the exchange algorithm offers further improvement. In many instances, the output of the exchange algorithm is in fact the same as the controller placement of (5.9). Figure 5-7 gives sample controller placements over RGGs, showing that the controllers are placed at highly central nodes, while providing good information coverage throughout the network. 5.2 Dynamic Controller Placement For a fixed controller location, the links physically close to the controller operate at a higher throughput than those far from the controller due to the delay in CSI. By relocating the controller, the throughput in different regions of the network can be 170 (a) 3-controllers (b) 4-controllers. Figure 5-7: Random geometric graph with multiple controllers placed using the myopic placement algorithm, followed by the controller exchange algorithm. Link colors correspond to distance from the nearest controller. 171 # Controllers 2 Controllers 3 Controllers 4 Controllers Myopic Exchange Optimal 91.46 94.241 94.479 109.68 111.51 111.984 122.04 122.72 122.82 # Controllers 2 Controllers 3 Controllers 4 Controllers 5 Controllers Myopic Exchange Optimal 144.62 149.53 149.55 166.88 168.88 169.03 181.01 181.6 181.6 193.34 193.505 193.515 (a) Experiment 1: 30 nodes, Connectivity ra(b) Experiment 2: 20 nodes, Connectivity radius R = 0.275. 10 Iterations. p = q = 0.3. dius R = 0.35. 20 Iterations. p = q = 0.3. # Controllers 2 Controllers 3 Controllers Myopic Exchange Optimal 111.94 113.76 113.76 136.82 138.66 138.66 (c) Experiment 3: 40 nodes, Connectivity radius R = 0.25. 10 Iterations. p = q = 0.3. Table 5.2: Maximum weight for different controller placement algorithms over random geometric graphs. balanced. In this section, we consider policies which recompute the controller location dynamically in order to balance the throughput throughout the network. Q1 (t) S (t) 1 R1 Q2 (t) S (t) 2 R2 QM (t) S (t) M RM λ1 λ2 BS λM Figure 5-8: Wireless Downlink For simplicity of exposition, consider a system of M nodes operating under an interference constraint such that only one node can transmit at any time, as in Fig 5-8. Packets arrive externally to each node i according to an i.i.d. Bernoulli arrival process Ai (t) of rate λi , and are stored in a queue at that node to await transmission. Let Qi (t) be the packet backlog of node i at time t. Each node has access to an independent time-varying ON/OFF channel as in Figure 5-1. If a node is scheduled for transmission, has a packet to transmit, and has an ON channel, then a packet departs the system from node i. 172 The above network model applies directly to a wireless downlink or uplink; however, it can easily be extended to a network setting. First, instead of the controller selecting one node to transmit, a set of non-interfering nodes is scheduled to transmit. The extension involves changing the scheduling optimization to be over all matchings in the network, rather than all individual nodes. Second, in a network, packets are required to traverse multiple hops on route to their destinations. This extension requires a modification to the throughput optimal policy of Theorem 21, analogous to the approach taken in [48]. In addition to each node i having delayed CSI pertaining to node j from the dj (i) time-slots in the past, it has delayed queue length information (QLI) as well. In other words, node i has delayed CSI Si (t − di (j)) and delayed QLI, Qi (t − di (j)) for each other node j. Let S(t−dr ) represent the vector of delayed CSI pertaining to controller r, i.e. S(t − dr ) = {Si (t − dr (i))}i . Let dmax = maxi,j dj (i), i.e. dmax is the network diameter. As described previously, one node is assigned the role of the controller. The controller uses delayed CSI and QLI to determine a schedule. Every N time-slots, the location of the controller is recomputed. In order to do this computation, each node must be able to compute the controller at the current slot without communicating with the other nodes. Therefore, the controller selection algorithm must only depend on globally available information. In particular, we consider algorithms that are based only on sufficiently delayed QLI, and do not consider CSI on deciding where to place the controller. Since CSI and QLI are available at each node with different delays, additional delays are introduced to ensure that each node has the same view of the network state for controller placement. In Section 5.2.2, we consider controller placement policies using only delayed QLI, since it is known that delayed QLI does not affect the throughput performance of the system [41]. In Section 5.2.3, this is extended to policies which also use homogeneously delayed CSI for controller placement, as older CSI might also be available to each node, and can be used to increase the throughput region. 173 The primary objective of this work is to determine a joint controller placement and scheduling policy to stabilize the system of queues. We now provide a definition of stability. Definition A queue with backlog Qi (t) is stable under policy π if n−1 1X lim sup E[Qi (t)] < ∞ n→∞ n t=0 (5.12) The complete network is stable if all queues are stable. Definition The throughput region Λ is the closure of the set of all rate vectors λ that can be stably supported over the network by a policy π ∈ Π. Lastly, we define a throughput optimal policy as follows Definition A policy is said to be throughput optimal if it stabilizes the system for any arrival rate λ ∈ Λ. In this work, we characterize the throughput region of the controller placement and scheduling problem above, and propose a throughput optimal controller placement and scheduling policy based on the information available at each node. 5.2.1 Two-Node Example λ1 λ2 c1 c2 Controller Selection Figure 5-9: Example 2-node system model. To illustrate the effect of dynamic controller relocation, consider a two-node system, as in Figure 5-9. Each node has instantaneous CSI pertaining to its channel at 174 the current time, and 1-step delayed CSI of the other channel. Let Λ1 be the throughput region when the controller is fixed at node 1, and let Λ2 be the throughput region when the controller is fixed at node 2. The throughput regions Λr are computed for each r by solving the following linear program (LP). Maximize: Subject To: X λi + ≤ P(S(t − dr ) = (s1 , s2 ))αi (s1 , s2 )E[Si (t)|Si (t − dr (i)) = si ] (5.13) (s1 ,s2 )∈S ∀i ∈ {1, 2} M X αi (s1 , s2 ) ≤ 1 ∀s ∈ S i=1 αi (s1 , s2 ) ≥ 0 ∀s ∈ S, i ∈ 1, 2 In the above LP, α(s1 , s2 ) represents the fraction of time link i is scheduled when delayed CSI at the controller is (s1 , s2 ). To maintain stable queue lengths, the arrival rate to each queue must be less than the service rate at that queue, which is dictated by the fraction of time the node transmits, and the expected throughput obtained over that link. For the case when the controller is at node r, Λr is the set of arrival rate pairs λ = (λ1 , λ2 ) such that there exists a solution to (5.13) satisfying ∗ > 0. This corresponds to the existence of a scheduling policy which transmits over link i with probability αi (s1 , s2 ) when the state of the channels are s1 and s2 respectively. The proof that Λr is in fact the stability region of the system is found in [69]. The throughput regions Λ1 and Λ2 are plotted in Figure 5-10 for the case when p = q = 0.1. The throughput region is larger in the dimension of the controller, as a higher throughput is obtained at the node for which current CSI is available. The other node cannot attain the same throughput due to the CSI delay at the controller. Now consider a time-sharing policy, alternating between placing the controller at node 1 and placing the controller at node 2. The resulting throughput region Λ is given by 175 Throughput region for different controllers: p = 0.1, q = 0.1 Perfect CSI Controller at 2 Controller at 1 0.5 2 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 1 Figure 5-10: Throughput regions for different controller scenarios. Assume the channel state model satisfies p = 0.1, q = 0.1, and d1 (2) = d2 (1) = 1. the convex hull of Λ1 and Λ2 , which is shown as the dotted black line in Figure 5-10. Time-sharing between controller placements allows for higher throughputs than if the controller is fixed at either node. For example, the point (λ1 , λ2 ) = ( 38 − , 38 − ), for small, is not attainable by any fixed controller placement; however, this throughput point is achieved by an equal time-sharing between controller locations. The correct time sharing between controller placements depends on the arrival rate. However, this information is usually unavailable, and we desire a control policy that stabilizes the system even if the arrival rates change. Thus, we propose a dynamic controller placement and scheduling policy which achieves the full throughput region Λ using only delayed QLI for controller placement, and delayed CSI and QLI for scheduling, with no information pertaining to the arrival rates. 5.2.2 Queue Length-based Dynamic Controller Placement We consider controller placement policies that depend only on delayed QLI. We assume that CSI in not available for use in placing the controller 1 . Let Π be the 1 For networks with a large diameter, the common CSI may be too stale to be used in controller placement;thus, we restrict our attention to policies which utilize QLI to make controller placement decisions, but not CSI. 176 set of all policies which make a controller-placement decision based on QLI and not CSI, and schedule a node to transmit based on the delayed CSI and QLI at the controller. This section proves that dynamically computing the controller placement as a function of queue lengths increases the throughput region over policies with fixed controller placements. The throughput region under such policies is evaluated, and the dynamic controller placement and scheduling (DCPS) policy is proposed and shown to stabilize the system for all arrival rates within the throughput region. Throughput Region Theorem 20 shows that the throughput region is computed by solving the following LP. Maximize: Subject To: λi + ≤ X PS (s) s∈S M X αir (s) ≤ 1 M X βr αir (s)E Si (t)Si (t − dr (i))] ∀i ∈ {1, . . . , M } r=1 ∀s ∈ S i=1 αir (s) ≥ 0 M X ∀s ∈ S, i, r ∈ 1, . . . , M βr ≤ 1 r=1 βr ≥ 0 ∀s ∈ S, i, r ∈ 1, . . . , M (5.14) This LP is an extension of the LP given in (5.13) to M nodes, with the addition of a time sharing between controller locations. The optimization variables βr and αir (s) correspond to controller placement and link scheduling policies respectively. The variables βr represent the fraction of the time that node r is elected to be a 177 controller, and αir (s) is the fraction of time that controller r schedules node i when the controller observes a delayed CSI of S(t − dr ) = s. Note that PS (s) is the stationary probability of the Markov chain in Figure 5-1. The throughput region Λ, is the set of all non-negative arrival rate vectors λ such that there exists a feasible solution to (5.14) for which ∗ ≥ 0. This implies that there exists a stationary policy such that the effective service rate at each queue is greater than the arrival rate to that queue. Theorem 20 (Throughput Region). For any non-negative arrival rate vector λ, the system can be stabilized by some policy P ∈ Π if and only if λ ∈ Λ. Necessity is shown in Lemma 15, and sufficiency is shown in Theorem 21 by proposing a throughput optimal joint scheduling and controller placement algorithm, and proving that for all λ ∈ Λ, that policy stabilizes the system. Lemma 15. Suppose there exists a policy P ∈ Π that stabilizes the network for all λ ∈ Λ. Then, there exists a βr and αir (s) such that (5.14) has a solution with ∗ ≥ 0. Proof. Suppose the system is stabilized with some control policy P, consisting of functions βr (t), which chooses a controller independent of channel state, and αir (t) which chooses a link activation based on delayed CSI at the controller. Without loss of generality, let βr (t) be an indicator function signaling whether node r is the controller at time t, and let αir (t) be an indicator signaling whether link i is scheduled for transmission at time t. Under any such scheme, the following relationship holds between arrivals, departures, and backlogs for each queue: t X Ai (τ ) ≤ Qi (t) + τ =1 t X µi (βr (τ ), αir (τ )), (5.15) τ =1 where µi is the service rate of the ith queue as a function of the control decisions. Expanding µi in terms of the decision variables βr (t) and αir (t) yields t X τ =1 Ai (τ ) ≤ Qi (t) + t X M X βr (τ )αir (τ )E[Si (τ )|Si (τ − dr (i))]. τ =1 r=1 178 (5.16) Let Tr be the subintervals of [1, t] over which r is the controller. Further, let TSr be the subintervals of Tr such that the controller r observes delayed CSI S(t − dr (i)) = S. Let |Tr | and |TSr | be the aggregate length of these intervals. Since the arrival and the channel state processes are ergodic, and the number of channel states and queues is finite, there exists a time t1 such that for all t ≥ t1 , the empirical average arrival rates and state occupancy fractions are within of their expectations. t 1X Ai (τ ) ≥ λi − t τ =1 (5.17) 1 |T r | ≤ P(Si (t) = S|r) + = P(Si (t) = S) + |Tr | S (5.18) The above equations hold with probability 1 from the strong law of large numbers [8]. Furthermore, since the system is stable under the policy P, [48] shows that there exists a V such that for an arbitrarily large t, P X M Qi (t) ≤ V i=1 1 ≥ . 2 Thus, let t be a large time index such that t ≥ t1 and (5.19) V t ≤ . If PM i=1 Qi (t) ≤ V , the inequality in (5.16) can be rewritten by dividing by t. t t M 1X 1 1 XX βr (τ )αi (τ )E[Si (τ )|Si (t − dr (τ ))]. Ai (τ ) ≤ V + t τ =1 t t τ =1 r=1 t M (5.20) t X1X 1X λi − ≤ Ai (τ ) ≤ + βr (τ )αi (τ )E[Si (τ )|Si (t − dr (τ ))]. t τ =1 t τ =1 r=1 (5.21) The lower bound in (5.21) follows from (5.17). Since βr (τ ) = 1 if and only if τ ∈ Tr , the inequality in (5.21) is equivalent to λi ≤ 2 + M X 1X r=1 t αi (τ )E[Si (τ )|Si (τ − dr (i))] τ ∈Tr 179 (5.22) M X |Tr | 1 X αi (τ )E[Si (τ )|Si (τ − dr (i))] = 2 + t |Tr | τ ∈T r=1 (5.23) r = 2 + M X βr r=1 1 X αi (τ )E[Si (τ )|Si (τ − dr (i))] |Tr | τ ∈T (5.24) r The last equation follows from defining βr , |Tr | , t (5.25) the empirical fraction of time that r is the controller. Now, break the summation over Tr into separate summations over the sub-intervals TSr for each observed S. Note that E[Si (τ )|Si (τ − dr (i))] is the k-step transition probability of the Markov chain in Figure 5-1 for k = dr (i). λi ≤ 2 + M X βr r=1 = 2 + M X = 2 + βr ≤ r=1 βr X |T r | 1 X d (i) S αi (τ )pSri ,1 r |Tr | |TS | τ ∈T r S∈S (5.27) S βr r=1 M X (5.26) S r=1 M X X 1 X d (i) αi (τ )pSri ,1 |Tr | τ ∈T r S∈S X |T r | S S∈S X |Tr | d (i) αri (S)pSri ,1 (5.28) d (i) P(Si (t) = S)αri (S)pSri ,1 + (2 + |S|) (5.29) S∈S where (5.28) follows from defining the fraction of time that link i is scheduled given r and S as αri (S) , 1 X αi (τ ), |TSr | τ ∈T r (5.30) S and (5.29) follows from (5.18) and the fact that controller placement is independent P P of channel state. Because the control functions satisfy r βr (t) ≤ 1 and i αi (t) ≤ 1, it follows that βr and αir satisfy those same constraints. Furthermore, the fraction of time node r is the controller, β r , is independent of the CSI. 180 The above inequality assumes greater than 1 2 PM i=1 Qi (t) ≤ V , which holds with probability by (5.19). Hence, there exists a set of stationary control decisions βr and αri satisfying the necessary constraints such that (5.29) holds for all i. If there did not exist such a stationary policy, than this inequality would hold with probability 0. Therefore, λ is arbitrarily close to a point in the region Λ, implying the constraints imposed by Λ are necessary for stability. Lemma 15 shows that for all λ ∈ Λ, there exists a stationary policy STAT ∈ Π that stabilizes the system, by placing the controller at r with probability βr , and schedules i to transmit when the delayed CSI at controller r is S with probability αri (S) Queuing Dynamics Consider a scheduling and controller placement policy P ∈ Π. Let DiP (t) be the departure process of queue i, such that DiP (t) = 1 if there is a departure from queue i at time t under policy P. Consider the evolution of the queues over T time slots, subject to a scheduling policy P. + X T −1 T −1 X P Ai (t + k) Di (t + k) + Qi (t + T ) ≤ Qi (t) − (5.31) k=0 k=0 Equation (5.31) is an inequality rather than an equality due to the assumption that the departures are taken from the backlog at the beginning of the T -slot period, and the arrivals occur at the end of the T slots. Under this assumption, the packets that arrive within the T -slot period cannot depart within this period. The square of the queue backlog is bounded using the inequality in (5.31). Q2i (t + T) ≤ Q2i (t) + X T −1 2 X 2 T −1 P Ai (t + k) + Di (t + k) k=0 k=0 X T −1 T −1 X P + 2Qi (t) Ai (t + k) − Di (t + k) k=0 181 k=0 (5.32) The above bound follows using the fact that Ai (t) ≥ 0 and Di (t) ≥ 0. Denote by Y(t) the relevant system state at time t. Since the CSI is delayed by different amounts of time depending on the location of the controller, and the controller changes locations over time, Y(t) is defined to include all possible combinations of delayed CSI, as well as the complete history of QLI. Y(t) = S(t − dmax ) . . . S(t), Q(0) . . . Q(t) (5.33) This definition insures the system state is Markovian. Note that the system state Y(t) is not completely available to the controller, since each node has delayed CSI. Because dmax is the largest delay to CSI in the network, values of S(τ ) for τ < t−dmax do not affect the evolution of the system. Due to the ergodicity of the finite state Markov chain controlling the channel state process, for any δ > 0, there exists an N such that the probability of the channel state conditioned on the channel state N slots in the past is within δ of the steady state probability of the Markov chain. P S(t) = sS(t − N ) − P S(t) = s ≤ δ (5.34) Define TSS () is to be a large constant such that when N = TSS (), (5.34) is satisfied for δ = , 2|S| where |S| = 2M . In other words, P S(t) = sS(t − TSS ()) − P S(t) = s ≤ 2|S| (5.35) TSS is related to the time it takes the Markov chain to approach its steady state distribution. Dynamic Controller Placement and Scheduling (DCPS) Policy In this section, we propose the dynamic controller placement and scheduling policy, and show that this policy stabilizes the network whenever the arrival rate vector is interior to the capacity region Λ. Additionally, this proves the sufficient condition of 182 Theorem 20. While the problem formulation is such that the controller is repositioned every N time-slots. In this section we prove throughput optimality for N = 1. The extension to general N is discussed in Section 5.3.1. Theorem 21. Consider the dynamic controller placement and scheduling (DCPS) policy, which operates in two steps. First, choose a controller by solving the following optimization as a function of the delayed queue backlogs Qi (t − τQ ). ∗ r = arg max r X PS (S(t − dr ) = s) max Qi (t − i s∈S d (i) τQ )psir,1 (5.36) where PS (s) is the steady state probability of the channel-state process. Then the controller uses its observed CSI S(t − dr∗ (i)) = s, and schedules the following queue to transmit. d ∗ (i) i∗ = arg max Qi (t − τQ )psir,1 (5.37) i For any arrival rate λ, and > 0 satisfying λ + 1 ∈ Λ, the DCPS policy stabilizes the system if τQ ≥ dmax + TSS () for TSS () defined in (5.35). Under policy DCPS, the controller is placed at the node maximizing the expected max-weight schedule, over all possible states. Then, the controller observes the delayed CSI and schedules the max-weight schedule for transmission as in [48] and [69]. Moving the controller to nodes with high backlog increases the throughput at those nodes, keeping the system stable. Theorem 21 is proved in Section 5.5.1. The proof follows using a Lyapunov drift technique [48], and shows that as the system backlogs grow large, the drift becomes negative, implying system stability. We consider the Lyapunov drift over a T -slot window, where T is large enough that the system reaches its steady state distribution. The throughput optimal controller placement uses delayed QLI Q(t − τQ ). The delay τQ is sufficiently large such that Q(t − τQ ) is available at every node, i.e. τQ ≥ dmax . Furthermore, we require that τQ ≥ dmax + TSS (), where TSS () is the time required for the channel state process to approach its steady state distribution (i.e. the mixing time of the Markov process). Even though QLI is available at much less delay, 183 the controller must use an older version of the QLI for throughput optimality. The reasoning behind this is related to the fact that long queues are typically located at nodes with OFF channels; however, if the QLI is sufficiently delayed, it is independent from the current channel state. This property of the optimal policy is investigated further in Chapter 6. Example: Homogeneous Delays Figure 5-11: Example star network topology where each node measures its own channel state instantaneously, and has d-step delayed CSI of each other node. For specific topologies, the throughput optimal controller placement in (5.36) takes on a simpler form. In particular, this section examines topologies for which each node is equidistant from all other nodes, as in Figure 5-11. Corollary 5. Consider a system of M nodes, where only one can transmit at each time. Assume the controller has full knowledge of its own channel state and d-slot delayed CSI for each other channel, as in Figure 5-11. At time t, the DCPS policy places the controller at the node with the largest backlog at time t − τQ . r∗ = arg max Qr (t − τQ ) (5.38) r Corollary 5 is proven by showing that the expression in (5.36) simplifies to (5.38) under the setting of homogeneous delays, which follows due to the symmetry of the system. A detailed proof is provided in the Appendix. Note the queue lengths in the above theorem must still be delayed according to Theorem 21. 184 5.2.3 Controller Placement With Global Delayed CSI In the previous section, the throughput optimal joint controller placement and scheduling policy is presented with the restriction to policies which use only delayed QLI for controller placement. The motivation behind this restriction is that old delayed QLI is available at each node, allowing the controller location to be computed without communication between nodes. In this vein, the channel state of each node dmax slots ago is also globally available knowledge, since dmax is the largest CSI delay in the network. If the network has small diameter, or a high degree of memory, the additional CSI has a significant impact on performance. In this section, we characterize the new throughput region, and propose an extension to the DCPS policy which stabilizes the system for all arrival rates within this stability region. Throughput Region The new throughput region is computed by solving the following LP. Maximize: Subject To: λi + ≤ X P(S(t − dmax ) = s) M X βr (s) r=1 s∈S X d (i) P(S(t − dr (i) = s0 )|S(t − dmax ) = s)αir (s)ps0r,1 s0 ∈S i ∀i ∈ {1, . . . , M } M X αir (s0 ) ≤ 1 ∀s ∈ S i=1 αir (s) ≥ 0 M X βr (s) ≤ 1 ∀s ∈ S, i, r ∈ 1, . . . , M ∀s ∈ S r=1 βr (s) ≥ 0 ∀s ∈ S, r ∈ 1, . . . , M (5.39) This LP is an extension of (5.14) allowing βr to be a function of S(t − dmax ). The 185 optimization variables βr (s) and αir (s0 ) correspond to controller placement and link d (i) scheduling policies respectively. Note that psir,1 is the k-step transition probability (where k = dr (i)) of the Markov channel state (Figure 5-1). The throughput region, Λ, is the set of all non-negative arrival rate vectors λ such that there exists a feasible solution to (5.14) for which ≥ 0. Theorem 22 (Throughput Region). For any non-negative arrival rate vector λ, the system is stabilized by some policy P ∈ Π if and only if λ ∈ Λ. Necessity in Lemma 16 and sufficiency is shown in Theorem 23 by proposing a throughput optimal joint scheduling and controller placement algorithm, and proving that for all λ ∈ Λ, that policy stabilizes the system. Lemma 16. Suppose there exists a policy P ∈ Π that stabilizes the system. Then, there exists variables βr (s) and αir (s0 ) such that (5.14) has a solution with ≥ 0. Lemma 16 shows that for all λ ∈ Λ, there exists a stationary policy STAT ∈ Π that stabilizes the system, by placing the controller at r with probability βr (S) when the maximally delayed CSI is S(t − dmax ) = S, and schedules i to transmit with probability αri (S 0 ) when the delayed CSI at controller r is S(t − dr ) = S 0 . Dynamic Controller Placement and Scheduling (DCPS) Policy Consider the queueing model of Section 5.2.2, which holds for the case when controller placement uses delayed CSI and QLI as well. In this section, we extend the dynamic controller placement and scheduling (DCPS) policy of Section 5.2.2 to utilize delayed CSI, and show that this policy stabilizes the system for all arrival rates within Λ. This proves the sufficient condition of Theorem 22. Theorem 23. Consider the modified DCPS policy, which operates in two steps. First, choose a controller by solving the following optimization as a function of the delayed queue backlogs Q(t − τQ ) and delayed CSI S(t − dmax ). ∗ r = arg max r X d (i) P S(t − dr (i)) = sS(t − dmax ) max Qi (t − τQ )psir,1 i s∈S 186 (5.40) The controller observes CSI S(t − dr∗ (i)) = s, and schedules the following queue to transmit. d ∗ (i) i∗ = arg max Qi (t − τQ )psir,1 (5.41) i The DCPS policy in (5.40) and (5.41) is throughput optimal if τQ > dmax . The proof of Theorem 23 is given in the Appendix, and follows according to the steps of the proof of Theorem 21 with modifications made to the conditioning throughout the proof. Under policy DCPS, the controller is placed at the node maximizing the expected max weight schedule, over all possible states, where this expectation is conditioned on globally available delayed CSI. Then the controller observes the delayed CSI and schedules the max-weight schedule for transmission according to [48] and [69]. Note that for controller placement polices which only use QLI, a very large delay is required for the DCPS policy, as the channel state must be independent from the queue length at that time. On the other hand, when the controller placement policy also depends on the delayed CSI S(t − dmax ), the queue length delay only needs to be larger .than dmax . This follows because the CSI takes away the dependence of the channel state on the delayed QLI. 5.3 Simulation Results To begin, we simulate a 6-Queue system with Bernoulli arrival processes of different rates. Assume the controller has instantaneous CSI for its channel, and homogeneously delayed (2 slots) CSI of each other channel. For each symmetric arrival rate vector λ, we simulate the evolution of the system over 100, 000 time-slots, and compute the average system backlog over this time. The results are plotted in Figure 5-12. Clearly, for small arrival rates, the average queue length remains very small. As the arrival rates increase towards the boundary of the stability region, the average system backlog starts to increase. When the arrival rate grows beyond the stability region, the average queue length increases greatly, since packets arrive faster than 187 they can be served in the system, implying that the system is unstable in this region. Figure 5-12 compares several controller placement policies. First, we plot the results of a fixed controller policy, as in Section 5.1. This is compared with a policy that chooses a controller at each time uniformly at random. Note that this random policy is optimal when the arrival rate is the same to each node, as it represents the correct stationary policy to stabilize the system. The red curve corresponds to the DCPS policy using QLI for controller placement, and the green curve corresponds to the DCPS policy using both QLI and delayed CSI to place the controller. First of all, dynamically changing the controller location provides a 7% increase in capacity region over the static controller placement. Additionally, observe that in this example, choosing a controller based on homogeneously delayed CSI as well as QLI offers a 6% to the capacity region over the region for policies restricted to using only QLI. In Figure 5-12a, the DCPS policy uses 2-step delayed QLI to place the controller. In this case, the DCPS policy fails to stabilize the system for the same set of arrival rates as the time-sharing policy, implying that the DCPS policy is not throughput optimal. However, in Figure 5-12b, the delay on the QLI is increased to 100 time-slots. In this scenario, the DCPS policy does stabilize the system for all symmetric arrival rates in the stability region. Thus, using further delayed information is required for throughput optimality. Note that using further delayed QLI in the DCPS policy where CSI is also used does not affect the stability of the system. The results in Figure 5-13 illustrate the effect of the delay in QLI on the stability of the system. This figure presents four different values for τQ , the delay to QLI used by the controller placement policy. The black dashed line corresponds to τQ = 0, the blue dash-dot line corresponds to τQ = 4, the red dotted line corresponds to τQ = 8, and the green solid line corresponds to τQ = 50. As τQ increases, the system remains stable for more arrival rates. In this example, using sufficiently delayed QLI yields a 16% increase in the stability region of the system. Additionally, we simulate the controller placement problem over a network, to compare the dynamic controller placement with the static controller placement in 188 (a) 2-Step Delayed QLI (b) 100-Step Delayed QLI Figure 5-12: Simulation results for different controller placement policies, with channel model parameters p = 0.1, q = 0.1. 189 Figure 5-13: Effect of QLI-delay on system stability, for p = q = 0.1. Each curve corresponds to a different value of τQ . 3 2 0 4 1 5 6 Figure 5-14: Two-level binary tree topology. Section 5.1. Consider the simple network in Figure 5-14. Figure 5-15 analyzes the stability of the system over different controller placement policies. The black solid line represents the DCPS policy, with QLI delay τQ = 150. This policy is compared with the policy that randomly selects the controller and the policy that places the controller at node 3. These results show that relocating the controller according to the DCPS policy shows improvements over both the optimal static placement, and a equal time-sharing between controller placements. Figure 5-16 shows the fraction of time each node is selected as the controller under the DCPS policy for the binary-tree topology of Figure 5-14. For small transition 190 Figure 5-15: Results for different controller placement policies on tree network in Figure 5-14: DPCS Policy with τQ = 150, equal time-sharing, and fixed controller at node 3. Simulation ran for 40,000 time slots with p = q = 0.3. probabilities (e.g. p = q = 0.1), the central node 3 is chosen as the controller most frequently. When the transition probabilities increase (e.g. p = q = 0.3), then more time is spent with nodes 2 and 4 as controllers. This corresponds with the analysis for static controller placement shown in Section 5.1.2. Moreover, as the arrival rate increases toward the boundary of the stability region, the results resemble the static results even more closely. Note that the DCPS policy can be applied to any network topology, but we only consider smaller topologies in this work due to the computational complexity of computing the optimal controller location. 5.3.1 Infrequent Controller Relocation Throughout this chapter, we assume that a new controller is chosen at every time slot. This is justified by ensuring that the controller placement algorithm depends only on information that is available to each node in the network. Thus, there is no additional communication overhead required to compute the controller placement. 191 Empirical Controller Locations, = 0.2 Fraction of time node is controller 0.7 p=q=0.1 p=q=0.3 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 4 Node Index 5 6 (a) Symmetric arrival rate λ = 0.2. Empirical Controller Locations, Fraction of time node is controller 0.7 = 0.25 p = q = 0.1 p = q = 0.3 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 Node Index 4 5 6 (b) Symmetric arrival rate λ = 0.25. Figure 5-16: Fraction of time each node is selected as the controller under DCPS for the topology in Figure 5-14. Blue bars correspond to system with p = q = 0.1, and red bars correspond to system with p = q = 0.3. 192 However, there may be an additional cost associated with relocating the controller due to the computation required. Therefore, in this section, we consider the case in which the controller placement occurs infrequently. Consider a modified version of the controller placement problem, in which the controller is relocated every N time slots. As discussed in Section 5.2.2, the throughput region is not affected by infrequent controller placement. Lemma 15 shows that any arrival rate λ ∈ Λ corresponds to a stationary policy which stabilizes the system. The throughput region Λ is formed by a time-sharing between controller placements. Consequently, the frequency of changing the controller placement does not affect throughput, but rather the overall fraction of time spent in each controller state. The DCPS policy of Section 5.2.2 extends directly to the case of infrequent controller placement as follows. Theorem 24. Consider the dynamic controller placement and scheduling policy (DCPS), which operates in two steps. First, at each time t = k ∗ N , choose a controller by solving the following optimization as a function of the delayed queue backlogs Qi (kN −τQ ). ∗ r = arg max r X PS (s) max Qi (kN − i s∈S d (i) τQ )psir,1 (5.42) where PS (s) is the steady state probability of the channel-state process. At the subsequent time slots t = kN +j, the controller uses its observed CSI S(kN +j−dr∗ (i)) = s, and schedules the following queue to transmit. d ∗ (i) i∗ = arg max Qi (kN − τQ )psir,1 (5.43) i For any arrival rate λ, and > 0 satisfying λ + 1 ∈ Λ, the DCPS policy stabilizes the system if τQ ≥ dmax + TSS () for TSS () defined in (5.35). The DPCS policy of Theorem 24 differs from that of Theorem 21 in that controller placement decisions are only made in time slots which are multiples of N , but the controller placement calculation is the same as in Theorem 21. The scheduling portion of Theorem 24 uses the delayed QLI with respect to the time at which the controller 193 was placed, rather than the current time slot. This additional delay in QLI, does not affect the throughput optimality of the policy. The proof of Theorem 24 follows similarly to the proof of Theorem 21, except using a T -slot drift argument at every time slot t = kN rather than every time slot. 5.4 Conclusion This chapter studies the impact of controller location on the performance of centralized scheduling in wireless networks. In Chapter 4, we showed that delayed CSI is inherent in centralized schemes, and this delay is related to the topology of the network. This chapter studies the impact of controller location on the performance of centralized scheduling in wireless networks, as the location of the controller directly influences the delays at which the CSI is available to the controller. First, we formulated the location of the optimal static controller placement, and developed near-optimal, low-complexity heuristics to place controllers over large networks. We consider dynamically placing controllers, using queue length information (QLI) to move the controller to the heavily backlogged areas of the network. We characterize the throughput region under dynamic controller placement, and propose a throughput optimal joint controller placement and scheduling policy. This policy uses significantly delayed QLI to place the controllers, and the CSI available at the controller to schedule links. We extend this policy to the case where CSI can also be used to place the controller. An interesting result in this section is that when the controller placement depends only on delayed QLI, the throughput optimal policy uses a very delayed version of the QLI, even if better QLI is available. This is due to the fact that QLI is related to CSI, particularly if there is a high degree of memory in the system. This is explored further in Chapter 6. 194 5.5 5.5.1 Appendix Proof of Theorem 21 Theorem 21: Consider the dynamic controller placement and scheduling (DCPS) policy, which operates in two steps. First, choose a controller by solving the following optimization as a function of the delayed queue backlogs Qi (t − τQ ). ∗ r = arg max r X PS (S(t − dr ) = s) max Qi (t − i s∈S d (i) τQ )psir,1 (5.44) where PS (s) is the steady state probability of the channel-state process. Then the controller uses its observed CSI S(t − dr∗ (i)) = s, and schedules the following queue to transmit. d ∗ (i) i∗ = arg max Qi (t − τQ )psir,1 (5.45) i For any arrival rate λ, and > 0 satisfying λ + 1 ∈ Λ, the DCPS policy stabilizes the system if τQ ≥ dmax + TSS () for TSS () defined in (5.35). Proof of Theorem 21. Define the following quadratic Lyapunov function: M L(Q(t)) = 1X 2 Q (t). 2 i=1 i (5.46) The T -step Lyapunov drift is computed as ∆T (Y(t)) = E L(Q(t + T )) − L(Q(t))Y(t) (5.47) We show that under the throughput optimal policy, the T -step Lyapunov drift is negative for large backlogs, implying the stability of the system under the throughput optimal max-weight policy for all arrival rates in the interior of Λ, by the FosterLyapunov Criteria [49]. To prove throughput optimality of the DCPS policy, we bound the Lyapunov drift under DCPS by combining (5.32), (5.46) and (5.47), and show for large queue lengths, the Lyapunov drift is negative. Let Di (t) = DiDCPS (t) 195 refer to the departure process of policy DCPS. Consider the T -step Lyapunov drift for T > τQ . X T −1 T −1 2 2 M 1 X 1 X ∆T (Y(t)) ≤ E Ai (t + k) + Di (t + k) 2 2 i=1 k=0 k=0 TX −1 T −1 X + Qi (t) Ai (t + k) − Di (t + k) Y(t) k=0 k=0 X TX M −1 T −1 X ≤B+E Qi (t) Ai (t + k) − Di (t + k) Y(t) i=1 (5.48) k=0 (5.49) k=0 (5.50) where B is a constant, which is finite due to the boundedness of the second moment of the arrival and departure process. The difference between queue lengths at any two times t and s is bounded using the following inequality: Qi (t) − Qi (s) ≤ |t − s|, (5.51) which holds by assuming that an arrival occurs in each slot, and no departures occur, or vice versa. Using this inequality, a relationship is established between current queue lengths and delayed queue lengths. Qi (t) ≤ Qi (t + k − τQ ) + |k − τQ | (5.52) Qi (t) ≥ Qi (t + k − τQ ) − |k − τQ | (5.53) The inequalities in (5.52) and (5.53) are used in (5.49) to upper bound the Lyapunov drift in terms of the delayed QLI for each slot, Qi (t + k − τQ ). X T −1 X M T −1 X M X Qi (t)Ai (t + k) − Qi (t)Di (t + k)Y(t) ∆T (Y(t)) ≤ B + E k=0 i=1 k=0 i=1 X T −1 X M ≤B+E (Qi (t + k − τQ ) + |k − τQ |)λi k=0 i=1 196 (5.54) T −1 X M X − (Qi (t + k − τQ ) − |k − τQ |)Di (t + k)Y(t) (5.55) k=0 i=1 =B+ T −1 X k=0 − ≤ B + 2M X X M T −1 X M |k − τQ |E (λi + Di (t + k))Y(t) + E Qi (t + k − τQ )λi i=1 T −1 X M X k=0 i=1 T −1 X k=0 k=0 i=1 Qi (t + k − τQ )Di (t + k)Y(t) (5.56) X T −1 X M |k − τQ | + E Qi (t + k − τQ ) λi − Di (t + k) Y(t) k=0 i=1 (5.57) X T −1 X M ≤ B + 2M T + E Qi (t + k − τQ ) λi − Di (t + k) Y(t) (5.58) X M T −1 X 0 Qi (t + k − τQ ) λi − Di (t + k) Y(t) ≤B +E (5.59) 2 k=0 i=1 k=0 i=1 Equation (5.55) follows from replacing expected value of the arrival process with the arrival rate λi . Equation (5.57) follows from upper bounding the per slot arrival and departure rate each by 1. Equation (5.58) follows from the fact that T ≥ τQ . Equation (5.59) follows by defining B 0 = B + 2M T 2 . Throughout the proof, we refer to the law of iterated expectations [8], which states that for random variables X, Y , and Z, the conditional expectation of X given Z is expanded as EX [X|Y ] = EZ EX [X|Y, Z]Y (5.60) where the subscript on the expectation references the random variable for which the expectation is taken over. The remainder of the proof follows by showing that as queue lengths get large, the Lyapunov drift is upper bounded by a negative quantity. Consider the second term on the right hand side of (5.59). This expectation is rewritten by conditioning on the delayed QLI at the current slot t + k and using the law of iterated expectations. 197 E X T −1 X M Qi (t + k − τQ ) λi − Di (t + k) Y(t) k=0 i=1 X T −1 =E k=0 X M E Qi (t + k − τQ ) λi − Di (t + k) Q(t + k − τQ ) Y(t) (5.61) i=1 To bound (5.61), we require the channel state at slot t + k to be independent from Y(t), which only holds if k is sufficiently large. Thus, we break the summation in (5.59) over the T slots into two parts: A smaller number of slots for which the value of k is small, and a larger number of slots where the value of k is large. An overly conservative bound is used for k < TSS + dmax , but the frame size T is chosen to ensure that the first TSS + dmax slots is a small fraction of the overall T slots. We drop the argument to the function TSS (), but the dependence on is clear. T −1 X M X ∆T (Y(t)) ≤ B + E Qi (t + k − τQ ) λi − Di (t + k) Y(t) 0 0 ≤B + i=1 k=0 TSS +d max −1 X i=1 k=0 T X + X M E Qi (t + k − τQ ) λi − Di (t + k) Y(t) k=TSS +dmax X M E Qi (t + k − τQ ) λi − Di (t + k) Y(t) (5.62) i=1 For values of k < TSS + dmax , the upper bound follows by trivially upper bounding the arrivals by 1 and lower bounding the departures by 0 in each slot. TSS +d max −1 X k=0 X M E Qi (t + k − τQ ) λi − Di (t + k) Y(t) i=1 ≤ ≤ TSS +d M max −1 X X i=1 k=0 TSS +d −1 M max X X k=0 Qi (t + k − τQ ) Qi (t − τQ ) + i=1 (5.63) TSS +d M max −1 X X k=0 198 i=1 k (5.64) ≤ (TSS + dmax ) M X i=1 1 Qi (t − τQ ) + (TSS + dmax )2 M 2 (5.65) where (5.64) follows from (5.51). Now consider the time slots for which k ≥ TSS + dmax . For these slots, we have k ≥ τQ ≥ dmax ≥ dr (i). For these time-slots, we bound the Lyapunov drift by computing the conditional departure rate under the DCPS policy, and showing that this policy must have a higher departure rate than all stationary policies. However, from Lemma 15, we know that a stationary policy exists which stabilizes the system, proving that the DCPS policy also stabilizes the system. To begin, the interior expectation in (5.61) is expanded as X M E Qi (t + k − τQ ) λi − Di (t + k) Q(t + k − τQ ), Y(t) = i=1 M X Qi (t + k − τQ ) λi − E Di (t + k) Q(t + k − τQ ), Y(t) (5.66) (5.67) i=1 = M X Qi (t + k − τQ )λi − i=1 M X Qi (t + k − τQ )E Di (t + k)Q(t + k − τQ ), Y(t) . i=1 (5.68) Consider the right-most expression in equation (5.68). Let φri be a binary indicator variable denoting whether queue i is scheduled under policy DCPS as a function of the delayed QLI and delayed CSI from controller r, and ψr be an indicator variable denoting whether node r is selected as the controller, as a function of delayed QLI only. M X Qi (t + k − τQ )E Di (t + k)Q(t + k − τQ ), Y(t) i=1 = M X Qi (t + k − τQ ) i=1 X M ·E ψr (Q(t + k − τQ ))φri S(t + k − dr ), Q(t + k − τQ ) Si (t + k)|Q(t + k − τQ ), Y(t) r=1 (5.69) 199 = M X M X ψr (Q(t + k − τQ ))Qi (t + k − τQ ) i=1 r=1 r · E E φi S(t + k − dr ), Q(t + k − τQ ) Si (t + k)Si (t + k − dr (i)) Q(t + k − τQ ), Y(t) (5.70) = M X M X ψr (Q(t + k − τQ ))Qi (t + k − τQ ) i=1 r=1 · E φri S(t + k − dr ), Q(t + k − τQ ) E Si (t + k)Si (t + k − dr (i)) Q(t + k − τQ ), Y(t) (5.71) = M X M X ψr (Q(t + k − τQ ))Qi (t + k − τQ ) i=1 r=1 dr (i) r · E φi S(t + k − dr ), Q(t + k − τQ ) pSi (t+k−dr ),1 Q(t + k − τQ ), Y(t) (5.72) Equation (5.70) follows since the controller placement under DCPS is completely determined given delayed QLI, and by applying the law of iterated expectation. Equation (5.71) follows since the link schedule under controller r for policy DCPS is completely determined given delayed QLI (Q(t + k − τQ )) and delayed CSI (S(t + k − dr )). Lastly, equation (5.72) follows using the k-step transition probability of the Markov chain. Note that the throughput optimal policy is that which maximizes the expression in (5.72); however, the expectation cannot be computed because it requires knowledge of the conditional distribution of the channel state sequence given QLI, which depends on the arrival rate. However, when QLI is sufficiently delayed, then the conditioning on QLI is removed as follows. P S(t + k − dr ) = sQ(t + k − τQ ), Y(t) X = P S(t + k − τQ ) = s0 Q(t + k − τQ ), Y(t) s0 ∈S · P S(t + k − dr ) = sS(t + k − τQ ) = s0 , Q(t + k − τQ ), Y(t) 200 (5.73) X = P S(t + k − τQ ) = s0 Q(t + k − τQ ), Y(t) s0 ∈S · P S(t + k − dr ) = sS(t + k − τQ ) = s0 ≥ X 0 P S(t + k − τQ ) = s Q(t + k − τQ ), Y(t) s0 ∈S (5.74) P S(t + k − dr ) = s − 2|S| (5.75) = P S(t + k − dr ) = s − 2|S| (5.76) Equation (5.73) follows from the law of total probability. Equation (5.74) holds using the fact that k ≥ τQ and the fact that due to the Markov property of the channel state, the state at time t+k−dr is conditionally independent of Y(t) and Q(t+k−τQ ) given S(t + k − τQ ). Equation (5.75) holds using the fact that τQ ≥ TSS + dmax , and thus, by the definition of TSS in (5.35), the conditional state distribution is within 2|S| of the stationary distribution. Consequently, the expression in (5.72) can be bounded in terms of an unconditional expectation. dr (i) r E φi S(t + k − dr ), Q(t + k − τQ ) pSi (t+k−dr (i)),1 Q(t + k − τQ ), Y(t) = X d (i) P S(t + k − dr ) = sQ(t + k − τQ ), Y(t) φri s, Q(t + k − τQ ) psir,1 (5.77) s∈S ≥ X d (i) P S(t + k − dr ) = s φri s, Q(t + k − τQ ) psir,1 s∈S d (i) X r φi S(t + k − dr ) = s, Q(t + k − τQ ) psir,1 2|S| s∈S dr (i) r ≥ ES(t+k−dr ) φi S(t + k − dr ), Q(t + k − τQ ) pSi (t+k−dr (i)),1 − 2 − (5.78) (5.79) d (i) The inequality in (5.79) follows by upper bounding φri (s, Q) ≤ 1 and pSri ,1 ≤ 1. 201 Plugging (5.79) into (5.72) yields M X M X ψr (Q(t + k − τQ ))Qi (t + k − τQ ) i=1 r=1 dr (i) r · E φi S(t + k − dr ), Q(t + k − τQ ) pSi (t+k−dr (i)),1 Q(t + k − τQ ), Y(t) ≥ M X M X ψr (Q(t + k − τQ ))Qi (t + k − τQ ) i=1 r=1 dr (i) r · E φi S(t + k − dr ), Q(t + k − τQ ) pSi (t+k−dr (i)),1 − 2 M M M XX X ≥− Qi (t + k − τQ ) + ψr (Q(t + k − τQ ))Qi (t + k − τQ ) 2 i=1 i=1 r=1 dr (i) r · E φi S(t + k − dr ), Q(t + k − τQ ) pSi (t+k−dr (i)),1 where the inequality in (5.81) follows from upper bounding PM r=1 (5.80) (5.81) φr (Q(t + k − τQ )) by 1. Under the DCPS policy, the service rate as a function of the controller and delayed CSI observation is given by: M X d (i) d (i) φri s, Q(t + k − τQ ) Qi (t + k − τQ )psir,1 = max Qi (t + k − τQ )psir,1 . i i=1 (5.82) Similarly, the expression for the expected value of the departure process can be rewritten using (5.82) and the structure of the controller placement policy of DCPS. M X ψr (Q(t+k −τQ ))ES r=1 d (i) max Qi (t+k −τQ )psir,1 i = max ES r d (i) max Qi (t+k −τQ )psir,1 i (5.83) Combining equations (5.83) and (5.81), and plugging this into equation (5.68) yields M X i=1 Qi (t + k − τQ )λi − M X Qi (t + k − τQ )E Di (t + k)Q(t + k − τQ ), Y(t) (5.84) i=1 202 ≤ M X Qi (t + k − τQ )λi − max ES max Qi (t + k − r i=1 i d (i) τQ )psir,1 M X Qi (t + k − τQ ) + 2 i=1 (5.85) Now, we reintroduce the stationary policy in Lemma 15 to complete the bound. Recall, for any λ ∈ Λ, there exists a stationary policy which assigns controller r with probability βr , and schedules node i for transmission with probability αir (s) for delayed CSI s ∈ S, such that λi + ≤ X PS (s) M X d (i) βr αir (s)psir,1 ∀i ∈ {1, . . . , M } (5.86) r=1 s∈S to be Note that the in (5.35) and in (5.86) are designed to be equal. Define µSTAT i the average departure rate of queue i under this stationary policy. In other words, µSTAT , i X PS (s) M X d (i) βr αir (s)psir,1 (5.87) r=1 s∈S The expression in (5.85) is rewritten by adding and subtracting identical terms corresponding to the stationary policy µSTAT . M X Qi (t + k − τQ )λi − max ES max Qi (t + k − r i=1 + M X i Qi (t + k − τQ )µSTAT − i i=1 = M X M X d (i) τQ )psir,1 M + Qi (t + k − τQ )µSTAT i X Qi (t + k − τQ ) 2 i=1 (5.88) i=1 Qi (t + k − τQ )(λi − µSTAT ) i + i=1 M X Qi (t + k − τQ )µSTAT i i=1 − max ES max Qi (t − r i d (i) τQ )psir,1 M X + Qi (t + k − τQ ) 2 i=1 (5.89) The first term in (5.89) is bounded using (5.86), which follows because the stationary 203 policy stabilizes the system. M X Qi (t + k − τQ )(λi − µSTAT ) i ≤ − M X i=1 Qi (t + k − τQ ) (5.90) i=1 The second term in (5.89) is bounded by relating the stationary policy to the DCPS policy: M X Qi (t + k − τQ )µSTAT i i=1 = M X Qi (t + k − τQ ) M XX d (i) PS (S(t + k − dr ) = s)βr αir (s)psir,1 s∈S r=1 i=1 (5.91) = M X βr r=1 X PS (S(t + k − dr ) = s) M X d (i) Qi (t + k − τQ )αir (s)psir,1 i=1 s∈S (5.92) ≤ M X βr r=1 X d (i) PS (S(t + k − dr ) = s) max Qi (t + k − τQ )psir,1 i s∈S (5.93) d (i) ≤ max ES max Qi (t + k − τQ )psir,1 r i (5.94) Returning to (5.89) and applying the inequalities in (5.94) and (5.90): X M π E Qi (t + k − τQ ) λi − Di (t + k) Q(t + k − τQ ), Y(t) i=1 ≤ M X Qi (t + k − τQ )(λi − µSTAT )+ i i=1 M X Qi (t + k − τQ )µSTAT i i=1 d (i) − max ES max Qi (t − τQ )psir,1 r i M + X Qi (t + k − τQ ) 2 i=1 M X Qi (t + k − τQ ) ≤− 2 i=1 (5.95) To conclude the proof, the bound in (5.51) is used to revert the QLI at time 204 t + k − τQ to a queue length at time t − τQ , which is known through knowledge of Y(t). M − M X X Qi (t + k − τQ ) ≤ − Qi (t − τQ ) + M k 2 i=1 2 i=1 2 (5.96) Now, we have an upper bound for the slots k ≥ TSS + dmax to combine with the bound for k ≤ TSS + dmax . Plugging these bounds into the drift bound of (5.62), yields 0 1 Qi (t − τQ ) + (TSS + dmax )2 M 2 i=1 M X E − Qi (t − τQ ) + M k Y(t) 2 i=1 2 ∆T (Y(t)) ≤ B + (TSS + dmax ) T X + k=TSS +dmax M X ≤ B 0 + (TSS + dmax ) M X i=1 1 Qi (t − τQ ) + (TSS + dmax )2 M 2 T −1 M − (T − TSS − dmax ) X X Qi (t − τQ ) + 2 i=1 k=T +d SS 0 ≤ B + (TSS + dmax ) M X i=1 (5.97) Mk 2 (5.98) max Qi (t − τQ ) + M (T 2 − (TSS − dmax )2 ) 4 M 1 X + (TSS + dmax )2 M − (T − TSS − dmax ) Qi (t − τQ ) 2 2 i=1 1 ≤ B 0 + (TSS + dmax )2 M (1 − ) + M T 2 2 2 4 M M X X + (TSS + dmax ) Qi (t − τQ ) − (T − TSS − dmax ) Qi (t − τQ ) 2 i=1 i=1 (5.99) (5.100) Thus, for any ξ > 0, T satisfying T ≥ 2(1 + 2 )(TSS + dmax ) + 2ξ 205 (5.101) and positive constant K satisfying 1 K = B 0 + (TSS + dmax )2 M (1 − ) + M T 2 , 2 2 4 it follows that ∆T (Y(t)) ≤ K − ξ M X Qi (t − τQ ) (5.102) (5.103) i=1 Thus, for large enough queue backlogs, the T -slot Lyapunov drift is negative, and from [48] it follows that the overall system is stable under the DCPS policy. 5.5.2 Proof of Corollary 5 Corollary 5: Consider a system of M nodes, where only one can transmit at each time. Assume the controller has full knowledge of its own channel state and d-slot delayed CSI for each other channel, as in Figure 5-11. At time t, the DCPS policy places the controller at the node with the largest backlog at time t − τQ . r∗ = arg max Qr (t − τQ ) (5.104) r Proof. Recall the optimal policy at each time is the DCPS policy in Theorem 21, where the controller is chosen to maximize the expected maximum weight schedule. Let Q(1) , . . . , Q(M ) be the ordering of delayed queue lengths Q(t − τQ ), such that Q(1) ≥ Q(2) . . . ≥ Q(M ) . Consider placing the controller at the node corresponding to Q(1) . Let k2 be the largest index i such that Q(i) pd11 ≥ Q(2) pd01 . The expected maxweight is a random variable, which takes values determined by the CSI. Let M Wi be the weight of the schedule activated by a controller at the ith largest queue. The expected max weight of a controller at Q(1) is given by (1) E[M W1 ] = Q π + k2 X Q(i) π(1 − π)i−1 pd11 + Q(2) pd01 (1 − π)k2 (5.105) i=2 Equation (5.105) is derived as follows. Since Q(1) is the largest queue, if that channel is ON the max-weight policy transmits from Q(1) . If that channel is OFF, then the 206 belief of that channel is zero, and it will not be used. Transmitting from Q(j) is optimal only if Q(i) is OFF for all j > i, since Q(i) are sorted in decreasing order. By the definition of k2 , for j > k2 , Q(j) pd11 < Q(2) pd01 , so it is optimal to schedule Q(2) when Q(i) is OFF for all i ≤ k2 . Now consider placing the controller at the node corresponding to queue Q(j) , for j ≥ 2. Let k1 be the largest index such that Q(k1 ) pd11 ≥ Q(1) pd01 . Similarly, define kj0 to 0 be the largest index such that Q(kj ) pd11 ≥ Q(j) . The expected max weight is computed for two cases, depending on the relationship between k1 and kj0 . First, consider the case where Q(j) ≤ Q(1) pd01 , i.e. k1 ≤ kj0 . In this case, it is never optimal to transmit over the channel corresponding to Q(j) , regardless of its delayed CSI. The expected max-weight is given by E[M Wj ] = πpd11 k1 X Q(i) (1 − π)i−1 + pd01 Q(1) (1 − π)k1 (5.106) i=1 Compare the expected max weight between the controller at Q(1) and Q(j) . (1) E[M W1 − M Wj ] = Q π + k2 X Q(i) π(1 − π)i−1 pd11 + Q(2) pd01 (1 − π)k2 i=2 (1) −Q πpd11 − k1 X Q(i) (1 − π)i−1 πpd11 − Q(k1 ) (1 − π)k1 pd01 (5.107) i=2 (1) = Q π(1 − pd11 ) + pd11 k2 X Q(i) π(1 − π)i−1 i=k1 +1 + Q(2) pd01 (1 − π)k2 − Q(1) (1 − π)k1 pd01 (5.108) ≥ Q(1) πpd10 − Q(1) (1 − π)k1 pd01 + Q(2) pd01 (1 − π)k2 (5.109) = Q(1) πpd10 − Q(1) π(1 − π)k1 −1 pd10 + Q(2) pd01 (1 − π)k2 ≥ Q(1) πpd10 1 − (1 − π)k1 −1 + Q(2) pd01 (1 − π)k2 ≥ 0 (5.110) (5.111) where (5.109) follows from Q(i) ≥ 0, and (5.110) follows from the identity πpd10 = (1 − π)pd01 . Now consider the case where Q(j) ≥ Q(1) pd01 . In this case, there exists a state such 207 that it is optimal to transmit over Q(j) . The max-weight expression is given by k0 E[M Wj ] = πpd11 j X 0 Q(i) (1 − π)i−1 + Q(j) π(1 − π)kj i=1 k1 X πpd11 + Q(i) (1 − π)i + pd01 Q(1) (1 − π)k1 +1 (5.112) i=kj +1 Comparing the expected max weight between the controller at Q(1) and Q(j) . (1) E[M W1 − M Wj ] = Q π + k2 X Q(i) π(1 − π)i−1 pd11 + Q(2) pd01 (1 − π)k2 − Q(1) πpd11 i=2 kj0 − πpd11 X Q(i) (1 − π)i−1 i=2 kj0 (j) − Q π(1 − π) − k1 X πpd11 Q(i) (1 − π)i − pd01 Q(1) (1 − π)k1 +1 i=kj0 +1 (5.113) =Q (1) π(pd10 ) (j) − Q π(1 − 0 π)kj pd10 + πpd11 k1 X Q(i) π(1 − π)i i=kj0 +1 + pd11 k2 X Q(i) π(1 − π)i−1 + Q(2) pd01 (1 − π)k2 − Q(1) (1 − π)k1 +1 pd01 i=k1 +1 (5.114) Equation (5.114) follows from combining like terms, and breaking up the summation over the interval i = [2, k2 ] into three intervals: [2, kj0 ], [kj0 + 1, k1 ], and [k1 + 1, k2 ], as well as an additional term for Q(j) . The summations are bounded as follows πpd11 k1 X (i) i−1 Q π(1 − π) + pd11 i=kj0 +1 k2 X Q(i) π(1 − π)i−1 i=k1 +1 ≥ πQ(1) pd01 k1 X π(1 − π) i−1 i=kj0 +1 + Q(2) pd01 k2 X π(1 − π)i−1 (5.115) i=k1 +1 0 = πQ(1) pd01 (1 − π)kj − (1 − π)k1 + Q(2) pd01 (1 − π)k1 − (1 − π)k2 (5.116) 208 The inequality in (5.116) follows from the fact that Q(1) pd01 ≤ Q(i) pd11 for i ≤ k1 , and Q(2) pd01 ≤ Q(i) pd11 for i ≤ k2 . Plugging this into equation (5.114) 0 E[M W1 − M Wj ] ≥ Q(1) πpd10 − Q(j) π(1 − π)kj pd10 + Q(2) pd01 (1 − π)k2 − Q(1) (1 − π)k1 +1 pd01 0 + πQ(1) pd01 (1 − π)kj − (1 − π)k1 + Q(2) pd01 (1 − π)k1 − (1 − π)k2 0 ≥ Q(1) πpd10 − Q(j) π(1 − π)kj pd10 − Q(1) (1 − π)k1 +1 pd01 + Q(2) pd01 (1 − π)k1 (5.117) 0 ≥ Q(1) πpd10 (1 − (1 − π)k1 ) − Q(j) π(1 − π)kj pd10 + Q(2) πpd10 (1 − π)k1 −1 (5.118) 0 ≥ Q(2) πpd10 (1 − (1 − π)k1 ) − Q(2) π(1 − π)kj pd10 + Q(2) πpd10 (1 − π)k1 −1 (5.119) 0 = Q(2) πpd10 1 − (1 − π)k1 − (1 − π)kj + (1 − π)k1 −1 ≥ 0 (5.120) The inequality in (5.117) follows from kj0 ≥ k1 , and canceling out Q(2) terms, (5.118) follows from the identity πpd10 = (1−π)pd01 , and (5.119) holds since Q(1) ≥ Q(2) ≥ Q(j) . Therefore, for all j ≥ 2, placing the controller at the node corresponding to Q(j) results in a lower expected max weight than placing at the node corresponding to the longest queue. Thus, placing the controller at the longest queue is the optimal controller placement policy. 5.5.3 Proof of Lemma 16 Lemma 16 Suppose there exists a policy P ∈ Π that stabilizes the system. Then, there exists variables βr (s) and αir (s0 ) such that (5.14) has a solution with ∗ ≥ 0. Proof. Suppose the system is stabilized with some control policy functions β(t), which chooses a controller depending on the QLI and CSI only through S(t − dmax ), and αr (t) which chooses a link activation based on delayed CSI and QLI, with delays relative to the controller. Without loss of generality, let βr (t) be an indicator function signaling whether node r is chosen to be the controller at time t, and let αir (t) be an indicator signaling whether link i is scheduled for transmission at time t. Under any 209 such scheme, the following relationship holds between arrivals, departures, and queue backlogs: t X Ai (τ ) ≤ Qi (t) + τ =1 t X M X µri (βr (τ ), αir (τ )), (5.121) τ =1 r=1 where µi is the service rate of the ith queue as a function of the control decisions. Writing out the expression for µi yields t X Ai (τ ) ≤ Qi (t) + τ =1 t X M X βr (τ )αir (τ )E[Si (τ )|Si (τ − dr (i))]. (5.122) τ =1 r=1 Let TS be the subintervals of [0, t] such that S(t − dmax ) = S. Further, let TSr be r the subintervals of TS such that r is the controller. Lastly, define TS,S 0 to be the subintervals of TSr such that the controller observes delayed CSI of S(t − dmax ) = S r and S(t − dr (i)) = S0 . Let |Tr |, |TSr |, and |TS,S 0 | be the aggregate lengths of those intervals. Because arrival and the channel state processes are ergodic, and the number of channel states and queues are finite, then there exists a time t1 such that for all t ≥ t1 , the empirical average arrival rates and state occupancy fractions are within of their expectations. t 1X Ai (τ ) ≥ λi − t τ =1 (5.123) 1 r |T 0 | ≤ P(S(t − dmax ) = S, S(t − dr ) = S0 ) + t S,S (5.124) The above equations hold with probability one from the strong law of large numbers. Furthermore, since the system is stable under the considered policy, [48] shows that there exists a V such that for an arbitrarily large t, P X M Qi (t) ≤ V i=1 1 ≥ . 2 Thus, let t be large enough such that t ≥ t1 and V t (5.125) ≤ . The inequality (5.122) P is evaluated at this time t, and by dividing by t and assuming M i=1 Qi (t) ≤ V , it 210 follows that t t M 1 1 XX 1X Ai (τ ) ≤ V + βr (τ )αi (τ )E[Si (t)|Si (t − dr (i))]. t τ =1 t t τ =1 r=1 t M (5.126) t X1X 1X Ai (τ ) ≤ + βr (τ )αi (τ )E[Si (t)|Si (t − dr (i))]. λi − ≤ t τ =1 t τ =1 r=1 The above inequality follows from (5.123) and holds with probability 1 2 (5.127) due to (5.125). Break up the above summation based on the globally delayed CSI S(t − dmax ), and then further based on the selected controllers, as determined by βr (τ ). λi ≤ 2 + M X 1XX r=1 t βr (τ )αi (τ )E[Si (t)|Si (t − dr (i))] (5.128) S∈S τ ∈TS M X |TS | 1 X X βr (τ )αi (τ )E[Si (t)|Si (t − dr (i))] = 2 + t |TS | τ ∈T r=1 S∈S (5.129) M X |TS | X |TSr | 1 X = 2 + αi (τ )E[Si (t)|Si (t − dr (i))] r t |T | |T | S r S r=1 S∈S τ ∈T (5.130) S S = 2 + X |TS | S∈S t M X β̂r (S) r=1 1 X αi (τ )E[Si (t)|Si (t − dr (i))] |TSr | τ ∈T r (5.131) S The last equation follows from defining β̄r (S) , |TSr | , |TS | (5.132) the fraction of time that r is chosen as the controller when the delayed state satisfies S(t − dmax ) = S. Now, break the summation over TSr into separate summations over the sub-intervals r 0 TS,S 0 for each observed S(t − dr (i)) = S . λi ≤ 2 + M X |TS | X S∈S = 2 + t X |TS | S∈S t β̂r (S) r=1 M X r=1 X 1 X d (i) αi (τ )pSr0 ,1 r i |TS | τ ∈T r S 0 ∈S (5.133) r X X |TS,S 0| 1 d (i) αi (τ )pSr0 ,1 r r i |TS | |TS,S 0 | τ ∈T r S 0 ∈S (5.134) S,S 0 β̂r (S) S,S 0 211 = 2 + M X |TS | X S∈S = 2 + ≤ t r X |TS,S 0| d (i) β̂r (S) ᾱir (S, S 0 )pSr0 ,1 r i |TS | r=1 S 0 ∈S (5.135) M r X X |TS,S 0| X d (i) β̂r (S)ᾱir (S, S 0 )pSr0 ,1 i t r=1 S∈S S 0 ∈S XX (5.136) P Si (t − dmax ) = S, Si (t − dr (i)) = S 0 M X S∈S S 0 ∈S d (i) β̂r (S)ᾱir (S, S 0 )pSr0 ,1 i r=1 + (2 + |S|2 ) (5.137) where (5.135) follows by defining ᾱir (S, S 0 ) , 1 X r |TS,S 0| τ ∈T r αi (τ ) (5.138) S,S 0 and (5.137) follows from (5.124). Because the original control functions satisfy P r βr (t) ≤ 1 and P i αi (t) ≤ 1, it follows that β¯r and α¯ir that satisfy these same constraints. Furthermore, the fraction of time node r is the controller, β̄r , depends on channel state information through only S(t − dmax ). The link schedule variable ᾱir is a stationary probability as a function of both S(t − dmax ) and S(t − dr (i)); however, due to the Markov property of the system, the optimal policy does not depend on the older CSI. Inequality (5.137) holds with probability greater than 21 , implying that there exists a set of stationary control decisions βr (S) and αri (S, S 0 ) satisfying the constraints in (5.137) for all i. If there was no such stationary policy, than this inequality would hold with probability 0. Therefore, λ is arbitrarily close to a point in the region Λ, implying the constraints imposed by Λ are necessary for system stability. 5.5.4 Proof of Theorem 23 Theorem 23: Consider the modified DCPS policy, which operates in two steps. First, choose a controller by solving the following optimization as a function of the delayed 212 queue backlogs Q(t − τQ ) and delayed CSI S(t − dmax ). ∗ r = arg max X r d (i) P S(t − dr (i)) = sS(t − dmax ) max Qi (t − τQ )psir,1 i s∈S (5.139) The controller observes CSI S(t − dr∗ (i)) = s, and schedules the following queue to transmit. d ∗ (i) i∗ = arg max Qi (t − τQ )psir,1 (5.140) i The DCPS policy in (5.40) and (5.41) is throughput optimal if τQ > dmax . Proof of Theorem 23. The proof of this Theorem follows the same structure as the proof to Theorem 21. We use the same Lyapunov function in (5.46) and the drift expression in (5.47). We bound the Lyapunov drift under DCPS by combining (5.32), (5.46) and (5.47), and show that for large queue lengths, the Lyapunov drift is negative. Let Di (t) = DiDCPS (t) refer to the departure process of policy DCPS. Recall, from (5.59), the Lyapunov drift is bounded as X T −1 X M ∆T (Y(t)) ≤ B + E Qi (t + k − τQ ) λi − Di (t + k) Y(t) 0 (5.141) k=0 i=1 Now consider the last term on the right hand side of the above equation. This expectation is rewritten by conditioning on the delayed QLI at the current slot t+k, as well as the globally available delayed CSI, and using the law of iterated expectations. TX −1 X M E Qi (t + k − τQ ) λi − Di (t + k) Y(t) k=0 i=1 TX −1 =E k=0 X M E Qi (t + k − τQ ) λi − Di (t + k) S(t + k − dmax ), Q(t + k − τQ ) Y(t) i=1 (5.142) Note (5.142) differs from (5.61) because of the extra conditioning on channel state. Similarly to the proof of Theorem 21, we break the summation over the T slots into two parts: A smaller number of slots for which the value of k is small, and a larger 213 number of slots where the value of k is large. An overly conservative bound is used for k < TSS + dmax , but the frame size T is chosen to ensure that the first TSS + dmax slots is a small fraction of the overall T slots. We drop the argument to the function TSS (), but the dependence on is clear. T −1 X M X ∆T (Y(t)) ≤ B + E Qi (t + k − τQ ) λi − Di (t + k) Y(t) 0 0 ≤B + + i=1 k=0 TSS +d max X −1 k=0 T X k=TSS +dmax X M E Qi (t + k − τQ ) λi − Di (t + k) Y(t) i=1 X M E Qi (t + k − τQ ) λi − Di (t + k) Y(t) (5.143) i=1 For values of k < TSS + dmax , the upper bound in (5.65) holds. Consider time slots k ≥ TSS + dmax , in which the interior expectation in (5.142) is expanded as E X M Qi (t + k − τQ ) λi − Di (t + k) S(t + k − dmax ), Q(t + k − τQ ), Y(t) i=1 (5.144) M X = Qi (t + k − τQ ) λi − E Di (t + k) S(t + k − dmax ), Q(t + k − τQ ), Y(t) i=1 (5.145) = M X Qi (t + k − τQ )λi i=1 − M X Qi (t + k − τQ )E Di (t + k)S(t + k − dmax ), Q(t + k − τQ ), Y(t) . i=1 (5.146) Consider the right-most term in equation (5.146). Let φri be a binary indicator variable denoting whether queue i is scheduled under the DCPS policy as a function of the delayed QLI and delayed CSI from controller r, and let ψr be an indicator variable denoting whether node r is the controller, as a function of delayed QLI and globally 214 delayed CSI. Let Q = Q(t + k − τQ ) in the following. M X Qi E Di (t + k)S(t + k − dmax ), Q, Y(t) i=1 = M X X M Qi E ψr (S(t + k − dmax ), Q) r=1 i=1 · = M X M X φri S(t + k − dr ), Q Si (t + k)S(t + k − dmax ), Q, Y(t) (5.147) ψr (S(t + k − dmax ), Q)Qi i=1 r=1 r · E E φi S(t + k − dr ), Q Si (t + k)Si (t + k − dr (i)) S(t + k − dmax ), Q, Y(t) (5.148) = M X M X ψr (S(t + k − dmax ), Q)Qi i=1 r=1 r · E φi S(t + k − dr ), Q E Si (t + k)Si (t + k − dr (i)) S(t + k − dmax ), Q, Y(t) (5.149) = M X M X ψr (S(t + k − dmax ), Q)Qi i=1 r=1 dr (i) r · E φi S(t + k − dr ), Q pSi (t+k−dr ),1 S(t + k − dmax ), Q, Y(t) = M X M X (5.150) ψr (S(t + k − dmax ), Q(t + k − τQ ))Qi (t + k − τQ ) i=1 r=1 dr (i) r · E φi S(t + k − dr ), Q(t + k − τQ ) pSi (t+k−dr ),1 S(t + k − dmax ) (5.151) Equation (5.148) follows since the controller placement under the DCPS policy is completely determined by the delayed QLI and globally delayed CSI, and then applying the law of iterated expectations. Equation (5.149) follows since the link schedule under DCPS is completely determined given delayed QLI (Q(t + k − τQ )) and delayed CSI (S(t + k − dr )). Equation (5.150) follows using the k-step transition probability of the Markov chain. Lastly, equation (5.151) follows because the locally delayed CSI S(t + k − dr ) is conditionally independent of Y(t) and Q(t + k − τQ ) given the globally delayed CSI S(t + k − dmax ), since k ≥ dmax and τQ > dmax . Note that we do not need a large τQ to remove the conditioning, as we did in Theorem 21. 215 Now, similarly to the proof of Theorem 21, we compare the DCPS policy to the STAT policy in Lemma 16, which is known to stabilize the system. Under the DCPS policy, the following simplification is made for the service rate given a controller and delayed CSI observation. M X d (i) d (i) φri s, Q(t + k − τQ ) Qi (t + k − τQ )psir,1 = max Qi (t + k − τQ )psir,1 . i i=1 (5.152) Similarly, the expression for the expected value of the departure process is rewritten using (5.152) and the structure of the controller placement policy of DCPS. M X ψr (Q(t + k − τQ ))ES(t+k−dr ) max Qi (t + k − i r=1 d (i) τQ )pSri (t+k−dr ),1 S(t + k − dmax ) X d (i) = max P S(t + k − dr ) = sS(t + k − dmax ) max Qi (t + k − τQ )psir,1 r i s∈S (5.153) Combining equations (5.153) and (5.151), and plugging this into equation (5.146) yields X M E Qi (t + k − τQ )λi i=1 − M X i=1 Qi (t + k − τQ )E Di (t + k) S(t + k − dmax ), Q(t + k − τQ ) Y(t) X M ≤E Qi (t + k − τQ )λi i=1 X dr (i) − max P S(t + k − dr ) = sS(t + k − dmax ) max Qi (t + k − τQ )psi ,1 Y(t) r i s∈S (5.154) ≤ M X Qi (t − τQ )λi + M k i=1 X dr (i) 0 − E max P S(t + k − dr ) = s S(t + k − dmax ) max Qi (t − τQ )ps0 ,1 Y(t) + k r i s0 ∈S i (5.155) 216 ≤ M X X Qi (t − τQ )λi + (M + 1)k − P S(t + k − dmax ) = s|Y (t) i=1 s∈S · max r ≤ X s0 ∈S d (i) 0 P S(t + k − dr ) = s S(t + k − dmax ) max Qi (t − τQ )ps0r,1 i (5.156) i M X X Qi (t − τQ ) − P S(t + k − dmax ) = s 2 i=1 i=1 s∈S X d (i) · max P S(t + k − dr ) = s0 S(t + k − dmax ) max Qi (t − τQ )ps0r,1 (5.157) M X Qi (t − τQ )λi + (M + 1)k + r i s0 ∈S i Equation (5.155) follows from (5.51). Equation (5.157) follows from the definition of TSS in (5.35), to remove the conditioning on Y(t). By Lemma 16, for any λ ∈ Λ, there exists a stationary policy which assigns controller r with probability βr (s), and schedules node i for transmission with probability αir (s0 ) for delayed channel state information s, s0 ∈ S, which satisfies λi + ≤ X P(S(t − dmax ) = s) M X βr (s) d (i) P(S(t − dr (i)) = s0 |S(t − dmax ) = s)αir (s0 )ps0r,1 i s0 ∈S r=1 s∈S X ∀i ∈ {1, . . . , M } (5.158) Define µSTAT to be the average departure rate of queue i under this stationary policy. i In other words, µSTAT i , X P(S(t−dmax ) = s) M X d (i) P(S(t−dr (i)) = s0 |S(t−dmax ) = s)αir (s0 )ps0r,1 i s0 ∈S r=1 s∈S X βr (s) (5.159) The expression in (5.154) is rewritten by adding and subtracting identical terms corresponding to the stationary policy µSTAT . M X Qi (t − τQ )λi − ES(t−dmax ) i=1 X dr (i) max P S(t − dr ) = sS(t − dmax ) max Qi (t − τQ )psi ,1 r i s∈S M M M i=1 i=1 i=1 X X X Qi (t − τQ ) + Qi (t − τQ )µSTAT − Qi (t − τQ )µSTAT + (M + 1)k + i i 2 (5.160) = M X Qi (t − τQ )(λi − µSTAT ) i i=1 217 X d (i) − ES(t−dmax ) max P S(t − dr ) = sS(t − dmax ) max Qi (t − τQ )psir,1 r + i s∈S M M i=1 i=1 X X Qi (t − τQ ) + Qi (t − τQ )µSTAT + (M + 1)k i 2 (5.161) The first term in (5.161) is bounded using the fact that λ ∈ Λ implies (5.158). Thus, M X ) ≤ − Qi (t − τQ )(λi − µSTAT i i=1 M X Qi (t − τQ ) (5.162) i=1 The term corresponding to the stationary policy in (5.161) is bounded as follows: M M M X X X X STAT Qi (t − τQ )µi = P(S(t − dmax ) = s) βr (s) Qi (t − τQ ) i=1 i=1 · X r=1 s∈S d (i) P(S(t − dr (i) = s0 )|S(t − dmax ) = s)αir (s0 )ps0r,1 = X (5.163) i s0 ∈S P(S(t − dmax ) = s) M X βr (s) r=1 s∈S · X P(S(t − dr (i) = s0 )|S(t − dmax ) = s) s0 ∈S M X d (i) Qi (t − τQ )αir (s0 )ps0r,1 i i=1 (5.164) ≤ X P(S(t − dmax ) = s) M X βr (s) r=1 s∈S · X d (i) P(S(t − dr (i) = s0 )|S(t − dmax ) = s) max Qi (t − τQ )ps0r,1 i s0 ∈S ≤ ES(t−dmax ) max r X 0 P(S(t − dr (i) = s )|S(t − dmax )) max Qi (t − i s0 ∈S (5.165) i d (i) τQ )ps0r,1 i (5.166) Returning to (5.141) and applying the inequalities in (5.166) and (5.162): X M E E Qi (t + k − τQ ) λi − Di (t + k) Q(t + k − τQ ), S(t + k − dmax ) Y(t) i=1 218 M M X M X X ≤ + Qi (t − τQ ) + + (M + 1)k Qi (t − τQ )(λi − Qi (t − τQ )µSTAT i 2 i=1 i=1 i=1 X dr (i) − ES max P S(t − dr ) = sS(t − dmax ) max Qi (t − τQ )psi ,1 (5.167) ) µSTAT i r i s∈S M ≤− X Qi (t − τQ ) + (M + 1)k 2 i=1 (5.168) This new bound applies to slots k ≥ TSS + dmax , and can be combined with the bound in (5.65) to bound the drift term in (5.143). + T X k=TSS +dmax M X 1 Qi (t − τQ ) + (TSS + dmax )2 M 2 i=1 M X E − Qi (t − τQ ) + (M + 1)k Y(t) 2 i=1 ∆T (Y(t)) ≤ B 0 + (TSS + dmax ) 0 ≤ B + (TSS + dmax ) M X i=1 1 Qi (t − τQ ) + (TSS + dmax )2 M 2 T −1 M − (T − TSS − dmax ) X X Qi (t − τQ ) + 2 i=1 k=T +d SS ≤ B 0 + (TSS + dmax ) M X i=1 (5.169) (M + 1)k (5.170) max 1 Qi (t − τQ ) + (TSS + dmax )2 M 2 M − (T − TSS X 1 − dmax ) Qi (t − τQ ) + (M + 1)(T 2 − (TSS − dmax )2 ) (5.171) 2 i=1 2 1 1 ≤ B 0 + (TSS + dmax )2 M (1 − ) + (M + 1)T 2 2 2 2 M M X X Qi (t − τQ ) + (TSS + dmax ) Qi (t − τQ ) − (T − TSS − dmax ) 2 i=1 i=1 (5.172) Thus, for any ξ > 0, T satisfying T ≥ 2(1 + 2 )(TSS + dmax ) + 2ξ 219 (5.173) and positive constant K satisfying 1 1 K = B 0 + (TSS + dmax )2 M (1 − ) + (M + 1)T 2 , 2 2 2 it follows that ∆T (Y(t)) ≤ K − ξ M X Qi (t − τQ ) (5.174) (5.175) i=1 Thus, for large enough queue backlogs, the T -slot Lyapunov drift is negative, and from [48] it follows that the overall system is stable. 220 Chapter 6 Scheduling over Time Varying Channels with Hidden State Information Consider the scheduling problem in a wireless downlink where channel state information (CSI) is unavailable at the base station, as in Figure 6-1. Packets arrive to the base station and are placed in queues to await transmission to their respective destinations. Due to wireless interference, only one transmission can be scheduled in each time slot. Therefore, the base station must schedule transmissions such that the queue lengths at the base station remain stable. Furthermore, the channels to each user are independent, but evolve over time according to a Markov process. Ideally, the transmitter opportunistically schedules channels yielding a high transmission rate; however, CSI is not available to the transmitter. Throughput optimal scheduling was pioneered by Tassiulas and Ephremides in [62], and has been studied in a variety of contexts. The optimal policies depend on the channel model and the information available to the transmitter, as summarized in Table 6.1. If the channel state process is IID, and no CSI is available, then any workconserving policy is throughput optimal; for the purpose of comparison, we define the throughput optimal policy in this scenario to be that which schedules the longest queue. If the transmitter has current CSI and queue length information (QLI), the throughput optimal policy is transmit over the channel that maximizes the product of the channel rate and the queue length at the current time [48, 62]. If the CSI and 221 QLI are delayed, Ying and Shakkottai show that the optimal policy schedules the node with the largest product of the delayed QLI and the conditional expectation of the channel rate at the current time, given the delayed CSI [69]. If the CSI is not acquired until an acknowledgement is received from the transmission, then the throughput optimal policy is to transmit over the channel that maximizes the product of the belief of the channel and the queue backlog [32]. While throughput optimal scheduling has been studied in a variety of contexts, to the best of our knowledge, there have been no results on throughput optimal scheduling when the controller has QLI but not CSI, and the channel process has memory. In fact, Tassiulas and Ephremides state that, ”An interesting variation of the problem... is the case where the connectivity information is not available for the decision making and the server allocation can be based on queue lengths... The study of stability and optimal delay performance in [the] case of dependent connectivities are open problems for further investigation.” [62]. Due to the memory in the channel state process, the throughput-optimal policy takes a non-trivial form, and the results from [62] and [48] cannot be directly applied. In this chapter, we consider a scenario in which QLI is readily available to the transmitter, but no CSI is available. In this case, the throughput optimal policy is to schedule the node with the longest queue length, using significantly delayed QLI. We characterize the throughput region for the case when CSI is not available at the transmitter. Then, we propose the Delayed Longest Queue-length (DLQ) policy, and prove it is throughput optimal over all transmission policies without access to CSI. Lastly, we provide simulation results to support the theoretical results of delayed QLI optimality. 6.1 System Model Consider a system of M nodes, representing a wireless downlink, as in Figure 6-1. Packets arrive externally at the base station, and are destined for node i according to an i.i.d. Bernoulli arrival process Ai (t) of rate λi . Packets are stored in a separate 222 Model No CSI Delayed CSI Full CSI IID Channels maxi Qi (t)E[Si (t)] maxi Qi (t)E[Si (t)] maxi Qi (t)Si (t) Markov Channels *This work* maxi Qi (t − τ )E[Si (t)|Si (t − τ )] maxi Qi (t)Si (t) Table 6.1: Throughput optimal policies for different system models. Column corresponds to a different amount of information at the controller. Rows corresponds to the memory in the channel. S(t) is the channel state at the current slot, and Q(t) is the queue backlog. Q1 (t) S (t) 1 R1 Q2 (t) S (t) 2 R2 QM (t) S (t) M RM λ1 λ2 BS λM Figure 6-1: Wireless Downlink queue at the base station, based on the destination node, to await transmission. Let Qi (t) be the packet backlog corresponding to node i at time t. Due to wireless interference, the base station is able to transmit to only one node at a time, although this model can easily be extended to allow for multiple transmissions per slot. Each node is connected to the base station through an independent time-varying channel. Let Si (t) ∈ {OFF, ON} be the channel state of the channel at node i at time t. Assume the channel states evolve over time according to a Markov chain, shown in Figure 6-2. If a packet for node i is scheduled for transmission, and Si (t) = ON, then the packet is successfully transmitted, assuming there are packets awaiting transmission, and that packet departs the system. On the other hand, if the channel at node i is OFF, then the transmission fails, and the packet remains in the system. Let PSi (1) and PSi (0) be the steady state probability of channel i being ON or OFF respectively. The base station has access to the history of queue lengths for each node i; however, the current channel states are only known by the respective receivers, but not 223 p 1−p 0 1 1−q q Figure 6-2: Markov Chain describing the channel state evolution of each independent channel. State 0 corresponds to an OFF channel, while state 1 corresponds to an ON channel. the base station. Therefore, the base station makes a transmission decision based on QLI, but not CSI1 . Let Π be the set of transmission policies which do not use CSI. The primary objective is to schedule transmissions such that the system of queues is stable. In this work, we characterize the throughput region of the system above, and propose a throughput optimal scheduling policy using delayed QLI. 6.2 Throughput Region The throughput region is computed by solving the following linear program (LP). Maximize: Subject To: λi + ≤ αi PSi (1) M X ∀i ∈ {1, . . . , M } (6.1) αi ≤ 1 i=1 αi ≥ 0 ∀i ∈ 1, . . . , M In the above LP, αi represents the fraction of time the base station schedules node i for transmission. To maintain stability, the arrival rate to each queue must be less than the service rate at that queue, which is a function of αi and the statistics of the 1 We assume packet acknowledgements occur at a separate layer, and cannot be used to predict the channel state. 224 channel. Thus, the throughput region, Λ, is the set of all non-negative arrival rate vectors λ such that there exists a feasible solution to (6.1) for which ∗ ≥ 0. The proof that Λ is the throughput region is given below. Theorem 25 (Throughput Region). For any non-negative arrival rate vector λ, the system can be stabilized by some policy P ∈ Π if and only if λ ∈ Λ. Necessity is shown in Lemma 17, and sufficiency is shown in Theorem 26 by proposing a throughput optimal scheduling policy, and proving that for all λ ∈ Λ, that policy stabilizes the system. Lemma 17. Suppose there exists a scheduling policy P ∈ Π that stabilizes the system without using CSI. There exists an αi such that (6.1) has a solution with ∗ ≥ 0. Proof. Consider the stabilizing policy P ∈ Π, consisting of control functions αi (t) which chooses a link to activate at each time. Note that this policy must be independent of CSI. Without loss of generality, let αi (t) be an indicator function equal to 1 if link i is scheduled for transmission at time t. Under any such scheme, the following relationship holds between arrivals, departures, and backlogs for each queue: t X Ai (τ ) ≤ Qi (t) + τ =1 t X µi (αi (τ )), (6.2) τ =1 where µi is the service rate of the ith queue as a function of the control decisions. Writing out the expression for µi in terms of the decision variables αi (t) yields t X Ai (τ ) ≤ Qi (t) + τ =1 t X αir (τ )PS (1). (6.3) τ =1 Since the arrival and the channel state process are ergodic, and the number of channel states and queues are finite, there exists a time t1 such that for all t ≥ t1 , the empirical average arrival rate is within 2 of its expectation. t 1X Ai (τ ) ≥ λi − t τ =1 2 225 (6.4) The above holds with probability 1 from the strong law of large numbers. Furthermore, since the system is stable under the policy P, [48] shows that there exists a V such that for an arbitrarily large t, P X M Qi (t) ≤ V i=1 1 ≥ . 2 Thus, let t be a large time index such that t ≥ t1 and (6.5) V t ≤ 2 . If PM i=1 Qi (t) ≤ V , the inequality in (6.3) can be rewritten by dividing by t, t t 1X 1 1X Ai (τ ) ≤ V + αi (τ )PS (1) t τ =1 t t τ =1 t λi − (6.6) t 1X 1X ≤ Ai (τ ) ≤ + αi (τ )PS (1) 2 t τ =1 2 t τ =1 λi − ≤ ᾱi PS (1) (6.7) (6.8) The lower bound in (6.7) follows from (6.4), and equation (6.8) follows from defining P P ᾱi = 1t Tτ=1 αi (τ ). Inequality (6.8) assumes M i=1 Qi (t) ≤ V , and holds with probability greater than 1 2 by (6.5), implying that there exists a set of stationary control decisions αi satisfying the necessary constraints such that (6.8) holds for all i. If there was no such stationary policy, than this inequality would hold with probability 0. Therefore, λ is arbitrarily close to a point in the region Λ, implying the constraints imposed by Λ are necessary for system stability. Lemma 17 shows that for all λ ∈ Λ, there exists a stationary policy STAT ∈ Π that stabilizes the system, by scheduling link i with probability αi . However, the correct value of αi relies on knowledge of the arrival rates to the system. In the following section, we develop a scheduling policy based on delayed QLI, that stabilizes the system without requiring knowledge of the arrival rates. 226 6.3 Dynamic QLI-Based Scheduling Policy Consider a scheduling and controller placement policy P ∈ Π. Let DiP (t) be the departure process of queue i, such that DiP (t) = 1 if there is a departure from queue i at time t under policy P. Consider the evolution of the queues over T time slots subject to a scheduling policy P. Qi (t + T ) ≤ Qi (t) − T −1 X + DiP (t) + k=0 T −1 X Ai (t + k) (6.9) k=0 Equation (6.9) is an inequality rather than an equality due to the assumption that the departures are taken from the backlog at the beginning of the T -slot period, and the arrivals occur at the end of the T slots. Under this assumption, the packets that arrive within the T -slot period cannot depart within this period. The square of the queue backlog can be bounded using the inequality in (6.9). Q2i (t + T) ≤ Q2i (t) + X T −1 2 X 2 T −1 P Di (t + k) Ai (t + k) + k=0 X T −1 + 2Qi (t) Ai (t + k) − k=0 k=0 T −1 X DiP (t + k) (6.10) k=0 The above bound follows from Ai (t) ≥ 0 and Di (t) ≥ 0. Due to the ergodicity of the finite-state Markov chain controlling the channel state process, for any > 0, there exists a τQ such that the probability of the channel state conditioned on the channel state τQ slots in the past is within 2 of the steady state probability of the Markov chain. P S(t) = sS(t − τQ ()) − P S(t) = s ≤ 2 (6.11) Note that τQ () is related to the mixing time of the Markov chain. In general, the Markov chain approaches steady state exponentially fast, at a rate of p + q [21]. Theorem 26 proposes the Delayed Longest Queue (DLQ) scheduling policy, which stabilizes the network whenever the input rate vector is interior to the capacity region Λ. Note, this proves sufficiency in Theorem 25. 227 Theorem 26. Consider the Delayed Longest Queue(DLQ) scheduling policy, which at time t schedules the channel which had the longest queue length at time (t − τQ ()), where τQ () is defined in (6.11). For any arrival rate λ, and > 0 satisfying λ + 1 ∈ Λ, the DLQ policy stabilizes the system. The DLQ policy transmits a packet from the longest queue using delayed queue length information. If fresher QLI is available, it cannot be used by the DLQ policy to stabilize the system. This is because at time t, the queue with the largest backlog Qi (t) is also likely to have an OFF channel. Scheduling the longest queue targets channels that are OFF, and therefore the queue backlogs are not decreased, and the system grows unstable. On the other hand, if sufficiently delayed QLI is used in the DLQ policy, then the QLI is independent of the current channel state, because the state process reaches its steady-state distribution over the τQ slots that the QLI is delayed. Therefore, the base station schedules queues for which the backlog is long, without favoring OFF channels. 6.4 Simulation Results In this section, we simulate a system of four queues, and apply the DLQ policy for different values of QLI delays (τQ ). We plot the average queue backlog over 100,000 time-slots for different symmetric arrival rates2 . For small arrival rates, the average queue length remains small. As the arrival rate increases, the backlog slowly increases until a certain point, after which the backlog greatly increases. This point represents the boundary of the throughput region, and for arrival rates outside of this region, the system of queues cannot be stabilized. For a system of four queues with symmetric channel transition probabilities p = q, the boundary of the stability region on the symmetric arrival rate line is given by 1 2 1 4 · = 0.125. Therefore, under the throughput optimal policy, the queue lengths should remain bounded for arrival rates λ < 0.125. 2 A symmetric arrival rate implies that each node sees the same arrival rate. 228 Figure 6-3 shows the results for transition probabilities p = q = 0.01, and Figure 6-4 shows the results for p = q = 0.1. As shown in Figure 6-3, when the QLI is insufficiently delayed, the system becomes unstable before the boundary of the stability region (0.125). For τQ = 1, the system becomes unstable at λ = 0.03. This represents a 75% reduction in the stability region. As τQ increases, the maximum arrival rate supportable by the DLQ policy increases. At τQ = 150, it is clear the system becomes stable for all arrival rates within the stability region. Similar results are shown in Figure 6-4 for a channel with less memory. In this case, the attainable throughput of the DLQ policy is less sensitive to the magnitude of the delays in QLI. The simulation results suggest that τQ = 100 is sufficient to achieve the full throughput region. The magnitude of the QLI delay required for throughput optimality is smaller due to the channel state having a smaller mixing time. Figure 6-3: Symmetric arrival rate versus average queue backlog for a 4-queue system under different DLQ policies. Transition probabilities satisfy p = q = 0.01 229 Figure 6-4: Symmetric arrival rate versus average queue backlog for a 4-queue system under different DLQ policies. Transition probabilities satisfy p = q = 0.1 6.5 Conclusion In this chapter, we designed a throughput optimal scheduling policy for a system in which channel states evolve over time according to a Markov process, and QLI is available to the scheduler but not CSI. We prove the throughput optimal policy uses delayed QLI rather than current QLI, as in the case where CSI is available to the transmitter. The required delay on the QLI depends on the mixing time of the channel state process. In general, the Markov channel state approaches steadystate exponentially at a rate of p + q. As p + q approaches 1, the Markov process approaches an IID process, and current QLI can be used. However, using further delayed QLI doesn’t affect the overall throughput region. The drawback of using delayed is increased packet delays. Therefore, if no CSI is available to the base station, the optimal policy must trade off between throughput and delay. 230 6.6 Appendix 6.6.1 Proof of Theorem 26 Theorem 26: Consider the Delayed Longest Queue (DLQ) scheduling policy, which at time t schedules the channel which had the longest queue length at time (t − τQ ()), where τQ () is defined in (6.11). For any arrival rate λ, and > 0 satisfying λ+1 ∈ Λ, the DLQ policy stabilizes the system. Proof of Theorem 26. Let τQ = τQ (), where the dependence on is clear. Let Y(t) be the history of queue-lengths in the system over up to time t. Y(t) = Q(0), . . . , Q(t) (6.12) The vector of delayed QLI forms a Markov Chain. Define the following quadratic Lyapunov function: M L(Q(t)) = 1X 2 Q (t). 2 i=1 i (6.13) The T -step Lyapunov drift is computed as ∆T (Y(t)) = E L(Q(t + T )) − L(Q(t))Y(t) (6.14) We show that under the DLQ policy, the T -step Lyapunov drift is negative for large backlogs, implying the stability of the system under the DLQ policy for all arrival rates within Λ, which follows from the Foster-Lyapunov criteria [49]. We bound the Lyapunov drift by combining (6.10), (6.13) and (6.14), and showing for large queue lengths, this upper bound is negative. Let Di (t) = DiDLQ (t) be the departure process under the DLQ policy. X T −1 2 T −1 2 M 1 X 1 X Ai (t + k) + Di (t + k) ∆T (Y(t)) ≤ E 2 2 i=1 k=0 k=0 231 + Qi (t) X T −1 Ai (t + k) − k=0 X T −1 X M ≤B+E Qi (t) i=1 T −1 X Di (t + k) Y(t) k=0 T −1 X Ai (t + k) − k=0 k=0 Di (t + k) Y(t) (6.15) (6.16) where B is a finite constant, which exists due to the boundedness of the second moment of the arrival process. The difference between queue lengths at any two times t and s is bounded using the following inequality: Qi (t) − Qi (s) ≤ |t − s|, (6.17) which holds assuming that an arrival occurs in each slot, and no departures occur, or vice versa. This inequality establishes a relationship between current queue lengths and delayed queue lengths. Qi (t) ≤ Qi (t + k − τQ ) + |k − τQ | (6.18) Qi (t) ≥ Qi (t + k − τQ ) − |k − τQ | (6.19) The inequalities in (6.18) and (6.19) are used in (6.16) to upper bound the Lyapunov drift in terms of the delayed QLI for each slot, Qi (t + k − τQ ). X T −1 X M T −1 X M X ∆T (Y(t)) ≤ B + E Qi (t)Ai (t + k) − Qi (t)Di (t + k)Y(t) k=0 i=1 (6.20) k=0 i=1 X T −1 X M ≤B+E (Qi (t + k − τQ ) + |k − τQ |)Ai (t + k) k=0 i=1 T −1 X M X − k=0 i=1 =B+ T −1 X k=0 (Qi (t + k − τQ ) − |k − τQ |)Di (t + k)Y(t) (6.21) X M |k − τQ |E (Ai (t + k) + Di (t + k))Y(t) i=1 X T −1 X M T −1 X M X +E Qi (t + k − τQ )Ai (t + k) − Qi (t + k − τQ )Di (t + k)Y(t) k=0 i=1 k=0 i=1 (6.22) 232 ≤ B + 2M T −1 X k=0 X T −1 X M |k − τQ | + E Qi (t + k − τQ ) Ai (t + k) − Di (t + k) Y(t) k=0 i=1 (6.23) X T −1 X M ≤ B + 2M T + E Qi (t + k − τQ ) Ai (t + k) − Di (t + k) Y(t) 2 k=0 i=1 (6.24) X T −1 X M 0 ≤B +E Qi (t + k − τQ ) λi − Di (t + k) Y(t) (6.25) k=0 i=1 Equation (6.23) follows from upper bounding the per-slot arrival and departure rate each by 1. Equation (6.24) follows from the fact that T ≥ τQ . Equation (6.25) follows by defining B 0 = B + 2M T 2 , and using E[Ai (t + k)] = λi . Now consider the last term on the right hand side of (6.25). This expectation can be rewritten by conditioning on the delayed QLI at the current slot t + k and using the law of iterated expectations, given in (5.60). X M T −1 X Qi (t + k − τQ ) λi − Di (t + k) Y(t) E k=0 i=1 X T −1 =E k=0 X M E Qi (t + k − τQ ) λi − Di (t + k) Q(t + k − τQ ) Y(t) (6.26) i=1 To bound (6.26), we require the channel state at slot t + k to be independent from Y(t), which only holds in slots where k is sufficiently large. Thus, we break the summation in (6.25) into two parts: a smaller number of slots for which k is small, and a larger number of slots for which k is large. A trivially conservative bound is used for k < τQ , but the frame size is chosen to ensure the first τQ slots is a small fraction of the overall T slots. T −1 X M X ∆T (Y(t)) ≤ B + E Qi (t + k − τQ ) λi − Di (t + k) Y(t) 0 k=0 i=1 233 τQ −1 M X X ≤B + E Qi (t + k − τQ ) λi − Di (t + k) Y(t) 0 + k=0 T −1 X k=τQ i=1 X M E Qi (t + k − τQ ) λi − Di (t + k) Y(t) (6.27) i=1 For values of k < τQ , the upper bound follows by trivially upper bounding the arrival rate by 1 and lower bounding the departures by 0 in each slot. τQ −1 M X X E Qi (t + k − τQ ) λi − Di (t + k) Y(t) k=0 ≤ ≤ i=1 τQ −1 M XX k=0 i=1 τQ −1 M XX E Qi (t + k − τQ )Y(t) Qi (t − τQ ) + k=0 i=1 M X ≤ τQ i=1 τQ −1 M XX k (6.28) (6.29) k=0 i=1 1 Qi (t − τQ ) + (τQ )2 M 2 (6.30) where (6.29) follows from (6.17). Now consider slots for which k ≥ τQ . Let φi be a binary indicator variable denoting whether queue i is scheduled under the DLQ policy as a function of the delayed QLI. For these time-slots, we evaluate the expected departure rate, and compare it to the departure rate of the stationary policy in Lemma 17, which we know stabilizes the system. The interior expectation in (6.26) is expanded as X M E Qi (t + k − τQ ) λi − Di (t + k) Q(t + k − τQ ), Y(t) = = i=1 M X i=1 M X Qi (t + k − τQ ) λi − E Di (t + k) Q(t + k − τQ ), Y(t) (6.31) (6.32) Qi (t + k − τQ )λi i=1 − M X Qi (t + k − τQ )E φi (Q(t + k − τQ ))Si (t + k)Q(t + k − τQ ), Y(t) (6.33) i=1 234 = M X Qi (t + k − τQ )λi i=1 − M X Qi (t + k − τQ )φi (Q(t + k − τQ ))E Si (t + k)Q(t + k − τQ ), Y(t) . i=1 (6.34) Equation (6.34) follows since the scheduling under DLQ is completely determined by the delayed QLI. Note that the throughput optimal policy minimizes the expression in (6.34); however, the expectation cannot be computed because it requires knowledge of the conditional distribution of the channel state sequence given QLI, which requires knowledge of the arrival rates to compute. However, when QLI is sufficiently delayed, the bound in (6.11) can be used to remove the conditioning on QLI as follows. P Si (t + k) = sQ(t + k − τQ ), Y(t) X = P Si (t + k − τQ ) = s0 Q(t + k − τQ ), Y(t) s0 ∈S · P Si (t + k) = sSi (t + k − τQ ) = s0 , Q(t + k − τQ ), Y(t) (6.35) X = P Si (t + k − τQ ) = s0 Q(t + k − τQ ), Y(t) P Si (t + k) = sSi (t + k − τQ ) = s0 s0 ∈S (6.36) ≥ X s0 ∈S 0 P Si (t + k − τQ ) = s Q(t + k − τQ ), Y(t) P Si (t + k) = s − 2 = P Si (t + k) = s − 2 (6.37) (6.38) Equation (6.35) follows from the law of total probability. Due to the Markov property of the channel state, the state at time t is conditionally independent of Q(t−τQ ) given S(t − τQ ), leading to equation (6.36). Equation (6.37) holds from the definition of τQ in (6.11), which implies the conditional state distribution is within 2 of the stationary distribution. The expression in (6.34) can now be bounded in terms of an unconditional expec235 tation. M X Qi (t + k − τQ )λi i=1 − M X Qi (t + k − τQ )φi (Q(t + k − τQ ))E Si (t + k)Q(t + k − τQ ), Y(t) i=1 = M X Qi (t + k − τQ )λi i=1 − M X Qi (t + k − τQ )φi (Q(t + k − τQ ))P Si (t + k) = 1Q(t + k − τQ ), Y(t) i=1 (6.39) ≤ M X Qi (t + k − τQ )λi i=1 − M X i=1 M X Qi (t + k − τQ ) Qi (t + k − τQ )φi (Q(t + k − τQ ))P Si (t + k) = 1 + 2 i=1 (6.40) Equation (6.39) follows from the distribution of channel state. The inequality in (6.40) follows from applying (6.38) and upper bounding φi (Q) ≤ 1. Under the DLQ policy, the total service rate is simplified as M X φi Q(t + k − τQ ) Qi (t + k − τQ )PS (1) = PS (1) max Qi (t + k − τQ ). i i=1 (6.41) Combining equation (6.41) with equation (6.40) yields M X i=1 M X Qi (t + k − τQ )λi − PS (1) max Qi (t + k − τQ ) + Qi (t + k − τQ ) i 2 i=1 (6.42) Now, we reintroduce the stationary policy of Lemma 17 to complete the bound. Recall that for any λ ∈ Λ, there exists a stationary policy which schedules node i for transmission with probability αi , and satisfies λi + ≤ αi PS (1) ∀i ∈ {1, . . . , M }. 236 (6.43) Note that the in the theorem statement and in (6.43) are designed to be equal. The expression in (6.42) is bounded by adding and subtracting identical terms corresponding to the stationary policy. M X i=1 M X Qi (t + k − τQ )λi − PS (1) max Qi (t + k − τQ ) + Qi (t + k − τQ ) i 2 i=1 + M X Qi (t + k − τQ )αi PS (1) − i=1 = M X M X i=1 − PS (1) max Qi (t + k − τQ ) + i ≤ − (6.44) i=1 Qi (t + k − τQ )(λi − αi PS (1)) + M X Qi (t + k − τQ )αi PS (1) Qi (t + k − τQ ) + M X i=1 2 M X Qi (t + k − τQ )αi PS (1) i=1 M X Qi (t + k − τQ ) (6.45) i=1 Qi (t + k − τQ )αi PS (1) − PS (1) max Qi (t + k − τQ ) i i=1 M X Qi (t + k − τQ ) + 2 i=1 (6.46) M X ≤− Qi (t + k − τQ ) 2 i=1 (6.47) Equation (6.46) follows from (6.43), and equation (6.47) follows from the fact that P since i αi ≤ 1, the weighted sum of queue lengths is maximized by placing all the weight at the largest queue length. To conclude the proof, the bound in (6.17) is used to revert the QLI at time t + k − τQ to a queue length at time t − τQ , which is known through knowledge of Y(t). M − M X X Qi (t + k − τQ ) ≤ − Qi (t − τQ ) + M k 2 i=1 2 i=1 2 (6.48) The upper bound in (6.48) for slots k ≥ τQ is combined with the bound in (6.30) for k < τQ to bound the drift in (6.27). ∆T (Y(t)) ≤ B 0 + τQ M X i=1 1 Qi (t − τQ ) + (τQ )2 M 2 237 M T −1 X X + Qi (t − τQ ) + M k Y(t) E − 2 i=1 2 k=τ (6.49) Q 0 ≤ B + τQ M X i=1 1 Qi (t − τQ ) + τQ2 M 2 M T −1 X X − (T − τQ ) Mk Qi (t − τQ ) + 2 i=1 2 k=τ (6.50) Q 0 ≤ B + τQ M X i=1 M 1 X Qi (t − τQ ) + τQ2 M − (T − τQ ) Qi (t − τQ ) 2 2 i=1 + M (T 2 − τQ2 ) 4 1 ≤ B 0 + τQ2 M (1 − ) + M T 2 2 2 4 M M X X + τQ Qi (t − τQ ) − (T − τQ ) Qi (t − τQ ) 2 i=1 i=1 (6.51) (6.52) Thus, for any ξ > 0, and T satisfying T ≥ 2(1 + 2 )τQ + 2ξ (6.53) and positive constant K satisfying 1 K = B 0 + τQ2 M (1 − ) + M T 2 , 2 2 4 it follows that ∆T (Y(t)) ≤ K − ξ M X Qi (t − τQ ) (6.54) (6.55) i=1 Thus, for large enough queue backlogs, the T -slot Lyapunov drift is negative, and from [48] it follows that the overall system is stable. 238 Chapter 7 Concluding Remarks In this thesis, we have studied the tradeoff between the amount and accuracy of the available control information and the achievable throughput for opportunistic scheduling. In wireless networks, the memory in the channel state process can be used to aid in scheduling, reducing the frequency that channel state information (CSI) needs to be acquired. This is essential to deal with the increasing overheads of future wireless networks. We addressed three fundamental questions pertaining to the information overheads in wireless scheduling: What is the minimum amount of information required, what is the best information to learn, and how do we optimally control the network with limited or inaccurate information? In Chapter 2, we analyzed channel probing as a means of acquiring network state information, and we developed optimal probing strategies by determining which channels to probe and how often to probe these channels. We showed that infrequent channel probing can be used to achieve high throughput in a multichannel system. In contrast to the work in [2, 71] that established the optimality of the myopic probe best policy, we showed that for a slightly modified model, these results no longer hold. Under a two channel system, we proved that probing either channel results in the same throughput, and under an infinite channel system, we proved that a simple alternative, the probe second-best policy, outperforms the probe best policy in terms of average throughput. We proved the optimality of the probe second-best policy in three channel systems, and conjecture that probing the second-best channel 239 is the optimal decision in a general multi-channel system. Proving this conjecture is interesting, and remains an open problem. Next, we developed a fundamental lower bound on the rate that CSI needs to be conveyed to the transmitter in order to achieve a throughput requirement. We modeled this problem as a causal rate-distortion minimization with a unique distortion measure that quantifies the throughput achievable as a function of the CSI at the transmitter. For the case of two channels, we computed a closed-form expression for the causal rate distortion function, and proposed a practical encoding algorithm to achieve the required throughput with limited CSI overhead. Analytic results for larger systems are an are of future research. In the second half of the thesis, we analyzed the scheduling problem over a wireless network. We developed a new model relating CSI delays to distance, reflecting the effect of transmission and propagation delays in conveying CSI across the network. Since centralized approaches are constrained to using this delayed information, we prove that in large networks or when the channels have little memory, distributed scheduling can outperform the optimal centralized scheduler. Additionally, we characterized the expected throughput of centralized and distributed scheduling over tree and clique topologies. Lastly, we characterized the effect that the controller location has on the ability to schedule transmissions over network, amidst delayed CSI. We showed that dynamically relocating the controller based on queue length information balances throughput across the network, and provides system stability. We proposed a throughput optimal joint controller placement and scheduling policy which stabilizes the system for any arrival rates within the throughput region. This policy used delayed QLI to relocate the controller. Interestingly, when the controller placement cannot depend on CSI, significantly delayed QLI is required to decouple the QLI from the CSI. We investigated this property in general in Chapter 6, and proposed a throughput optimal scheduling policy in a system where QLI is available, but not CSI. 240 Bibliography [1] A. A. Abouzeid and N. Bisnik. Geographic protocol information and capacity deficit in mobile wireless ad hoc networks. Information Theory, IEEE Trans. on, 57(8):5133–5150, 2011. [2] S. Ahmad, M. Liu, T. Javidi, Q. Zhao, and B. Krishnamachari. Optimality of myopic sensing in multichannel opportunistic access. Information Theory, IEEE Transactions on, 2009. [3] J. Andrews, S. Shakkottai, R. Heath, N. Jindal, M. Haenggi, R. Berry, D. Guo, M. Neely, S. Weber, S. Jafar, et al. Rethinking information theory for mobile ad hoc networks. Communications Magazine, IEEE, 46(12):94–101, 2008. [4] J. G. Andrews, A. Ghosh, and R. Muhamed. Fundamentals of WiMAX: understanding broadband wireless networking. Pearson Education, 2007. [5] M. Andrews, K. Kumaran, K. Ramanan, A. Stolyar, R. Vijayakumar, and P. Whiting. Scheduling in a queuing system with asynchronously varying service rates. Probability in the Engineering and Informational Sciences, 18(02):191–217, 2004. [6] T. Berger. Rate-Distortion Theory. Wiley Online Library, 1971. [7] T. Berger. Explicit bounds to r (d) for a binary symmetric markov source. Information Theory, IEEE Transactions on, 23(1):52–59, 1977. [8] D. P. Bertsekas. Introduction to Probability: Dimitri P. Bertsekas and John N. Tsitsiklis. Athena Scientific, 2002. [9] V. Borkar, S. Mitter, A. Sahai, and S. Tatikonda. Sequential source coding: an optimization viewpoint. In CDC-ECC’05, pages 1035–1042. IEEE, 2005. [10] G. D. Celik, L. B. Le, and E. Modiano. Scheduling in parallel queues with randomly varying connectivity and switchover delay. In INFOCOM, 2011 Proceedings IEEE, pages 316–320. IEEE, 2011. 241 [11] N. Chang and M. Liu. Optimal channel probing and transmission scheduling for opportunistic spectrum access. In International Conference on Mobile Computing and Networking: Proceedings of the 13 th annual ACM international conference on Mobile computing and networking, 2007. [12] P. Chaporkar and A. Proutiere. Optimal joint probing and transmission strategy for maximizing throughput in wireless systems. Selected Areas in Communications, IEEE Journal on, 2008. [13] M. Chiang and S. Boyd. Geometric programming duals of channel capacity and rate distortion. Information Theory, IEEE Transactions on, 50(2):245–258, 2004. [14] Cisco. Cisco visual networking index index: Forecast and methodology 20132018, 2014. [15] T. M. Cover and J. A. Thomas. Elements of information theory. John Wiley & Sons, 2012. [16] M. S. Daskin. Network and discrete location: models, algorithms, and applications. John Wiley & Sons, 2011. [17] A. Dimakis and J. Walrand. Sufficient conditions for stability of longest-queuefirst scheduling: Second-order properties using fluid limits. Advances in Applied probability, pages 505–521, 2006. [18] B. P. Dunn and J. N. Laneman. Basic limits on protocol information in slotted communication networks. In ISIT 2008, pages 2302–2306. IEEE, 2008. [19] R. Gallager. Basic limits on protocol information in data communication networks. IEEE Transactions on Information Theory, 1976. [20] R. Gallager and R. Gallager. Discrete stochastic processes. Kluwer Academic Publishers, 1996. [21] R. G. Gallager. Stochastic processes: theory for applications. Cambridge University Press, 2013. [22] E. N. Gilbert. Capacity of a burst-noise channel. Bell system technical journal, 39(5):1253–1265, 1960. [23] A. Gopalan, C. Caramanis, and S. Shakkottai. On wireless scheduling with partial channel-state information. In Proc. Ann. Allerton Conf. Communication, Control and Computing, 2007. [24] A. Gorbunov and M. S. Pinsker. Nonanticipatory and prognostic epsilon entropies and message generation rates. Problemy Peredachi Informatsii, 9(3):12– 21, 1973. 242 [25] R. Gray. Information rates of autoregressive processes. Information Theory, IEEE Trans. on, 16(4):412–421, 1970. [26] J. L. Gross and J. Yellen. Graph theory and its applications. CRC press, 2005. [27] S. Guha, K. Munagala, and S. Sarkar. Jointly optimal transmission and probing strategies for multichannel wireless systems. In Information Sciences and Systems, 2006 40th Annual Conference on. IEEE, 2006. [28] S. Guha, K. Munagala, and S. Sarkar. Optimizing transmission rate in wireless channels using adaptive probes. In ACM SIGMETRICS Performance Evaluation Review. ACM, 2006. [29] B. Hajek and G. Sasaki. Link scheduling in polynomial time. Information Theory, IEEE Transactions on, 34(5):910–917, 1988. [30] J. Hong and V. O. Li. Impact of information on network performance-an information-theoretic perspective. In GLOBECOM 2009., pages 1–6. IEEE, 2009. [31] K. Jagannathan, S. Mannor, I. Menache, and E. Modiano. A state action frequency approach to throughput maximization over uncertain wireless channels. In INFOCOM, 2011 Proceedings IEEE. IEEE, 2011. [32] K. Jagannathan, S. Mannor, I. Menache, and E. Modiano. A state action frequency approach to throughput maximization over uncertain wireless channels. Internet Mathematics, 9(2-3):136–160, 2013. [33] K. Jagannathan and E. Modiano. The impact of queue length information on buffer overflow in parallel queues. In Proceedings of the 47th annual Allerton conference on Communication, control, and computing, pages 1103–1110. IEEE Press, 2009. [34] S. Jalali and T. Weissman. New bounds on the rate-distortion function of a binary markov source. In ISIT 2007., pages 571–575. IEEE, 2007. [35] L. Jiang and J. Walrand. A distributed csma algorithm for throughput and utility maximization in wireless networks. IEEE/ACM Transactions on Networking (TON), 18(3):960–972, 2010. [36] M. Johnston and E. Modiano. Optimal channel probing in communication systems: The two-channel case. In Global Communications (GLOBECOM), 2013 IEEE International Symposium on. IEEE, 2013. [37] M. Johnston, E. Modiano, and I. Keslassy. Channel probing in communication systems: Myopic policies are not always optimal. In Information Theory Proceedings (ISIT), 2013 IEEE International Symposium on, pages 1934–1938. IEEE, 2013. 243 [38] M. Johnston, E. Modiano, and Y. Polyanskiy. Opportunistic scheduling with limited channel state information: A rate distortion approach. In Proc. IEEE ISIT, 2014. [39] C. Joo, X. Lin, and N. B. Shroff. Understanding the capacity region of the greedy maximal scheduling algorithm in multihop wireless networks. IEEE/ACM Transactions on Networking (TON), 17(4):1132–1145, 2009. [40] C. Joo and N. B. Shroff. Performance of random access scheduling schemes in multi-hop wireless networks. IEEE/ACM Transactions on Networking (TON), 17(5):1481–1493, 2009. [41] K. Kar, X. Luo, and S. Sarkar. Throughput-optimal scheduling in multichannel access point networks under infrequent channel measurements. Wireless Communications, IEEE Transactions on, 2008. [42] P. Karn. Maca-a new channel access method for packet radio. In ARRL/CRRL Amateur radio 9th computer networking conference, volume 140, pages 134–140, 1990. [43] Y. Y. Kim and S.-q. Li. Capturing important statistics of a fading/shadowing channel for network performance analysis. Selected Areas in Communications, IEEE Journal on, 17(5):888–901, 1999. [44] L. B. Le, E. Modiano, C. Joo, and N. B. Shroff. Longest-queue-first scheduling under sinr interference model. In Proceedings of the eleventh ACM international symposium on Mobile ad hoc networking and computing, pages 41–50. ACM, 2010. [45] C.-p. Li and M. J. Neely. Exploiting channel memory for multiuser wireless scheduling without channel measurement: Capacity regions and algorithms. Performance Evaluation, 68(8):631–657, 2011. [46] X. Lin and N. B. Shroff. The impact of imperfect scheduling on cross-layer rate control in wireless networks. In INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE, volume 3, pages 1804–1814. IEEE, 2005. [47] E. Modiano, D. Shah, and G. Zussman. Maximizing throughput in wireless networks via gossiping. In ACM SIGMETRICS Performance Evaluation Review, volume 34, pages 27–38. ACM, 2006. [48] M. Neely, E. Modiano, and C. Rohrs. Dynamic power allocation and routing for time-varying wireless networks. IEEE Journal on Selected Areas in Communications, 23(1):89–103, 2005. [49] M. J. Neely. Dynamic power allocation and routing for satellite and wireless networks with time varying channels. PhD thesis, Massachusetts Institute of Technology, 2003. 244 [50] M. J. Neely. Stochastic network optimization with application to communication and queueing systems. Synthesis Lectures on Communication Networks, 3(1):1– 211, 2010. [51] D. Neuhoff and R. Gilbert. Causal source codes. Information Theory, IEEE Transactions on, 28(5):701–713, 1982. [52] M. Newman. Networks: an introduction. Oxford University Press, 2010. [53] J. Ni, B. Tan, and R. Srikant. Q-csma: Queue-length-based csma/ca algorithms for achieving maximum throughput and low delay in wireless networks. Networking, IEEE/ACM Transactions on, 20(3):825–836, 2012. [54] A. Pantelidou, A. Ephremides, and A. L. Tits. Joint scheduling and routing for ad-hoc networks under channel state uncertainty. In Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks and Workshops, 2007. WiOpt 2007. 5th International Symposium on, pages 1–8. IEEE, 2007. [55] C. H. Papadimitriou and K. Steiglitz. Combinatorial optimization: algorithms and complexity. Courier Dover Publications, 1998. [56] A. Proutiere, Y. Yi, and M. Chiang. Throughput of random access without message passing. In CISS, pages 509–514, 2008. [57] S. Sanghavi, L. Bui, and R. Srikant. Distributed link scheduling with constant overhead. In ACM SIGMETRICS Performance Evaluation Review, volume 35, pages 313–324. ACM, 2007. [58] S. Sarkar and S. Ray. Arbitrary throughput versus complexity tradeoffs in wireless networks using graph partitioning. Automatic Control, IEEE Transactions on, 53(10):2307–2323, 2008. [59] B. T. SCHEME. Lte: the evolution of mobile broadband. IEEE Communications Magazine, page 45, 2009. [60] P. A. Stavrou and C. D. Charalambous. Variational equalities of directed information and applications. arXiv preprint arXiv:1301.6520, 2013. [61] L. Tassiulas and A. Ephremides. Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks. Automatic Control, IEEE Transactions on, 37(12):1936–1948, 1992. [62] L. Tassiulas and A. Ephremides. Dynamic server allocation to parallel queues with randomly varying connectivity. Information Theory, IEEE Transactions on, 39(2):466–478, 1993. [63] S. Tatikonda and S. Mitter. Control under communication constraints. Automatic Control, IEEE Transactions on, 49(7):1056–1068, 2004. 245 [64] M. B. Teitz and P. Bart. Heuristic methods for estimating the generalized vertex median of a weighted graph. Operations research, 16(5):955–961, 1968. [65] D. Tse and P. Viswanath. Fundamentals of wireless communication. Cambridge university press, 2005. [66] J. Walrand and P. Varaiya. Optimal causal coding-decoding problems. Information Theory, IEEE Transactions on, 29(6):814–820, 1983. [67] H. S. Wang and N. Moayeri. Finite-state markov channel-a useful model for radio communication channels. Vehicular Technology, IEEE Transactions on, 44(1):163–171, 1995. [68] H. Witsenhausen. On the structure of real-time source coders. Bell Syst. Tech. J, 58(6):1437–1451, 1979. [69] L. Ying and S. Shakkottai. On throughput optimality with delayed network-state information. Information Theory, IEEE Transactions on, 2011. [70] L. Ying and S. Shakkottai. Scheduling in mobile ad hoc networks with topology and channel-state uncertainty. Automatic Control, IEEE Transactions on, 57(10):2504–2517, 2012. [71] Q. Zhao, B. Krishnamachari, and K. Liu. On myopic sensing for multi-channel opportunistic access: Structure, optimality, and performance. Wireless Communications, IEEE Transactions on, 2008. [72] M. Zorzi, R. R. Rao, and L. B. Milstein. On the accuracy of a first-order markov model for data transmission on fading channels. In Proc. IEEE ICUPC95, pages 211–215. Citeseer, 1995. [73] G. Zussman, A. Brzezinski, and E. Modiano. Multihop local pooling for distributed throughput maximization in wireless networks. In INFOCOM 2008. The 27th Conference on Computer Communications. IEEE. IEEE, 2008. 246