DBLA: DISTRIBUTED BLOCK LEARNING ALGORITHM FOR CHANNEL SELECTION IN COGNITIVE RADIO NETWORKS - Chowdhury Sayeed Hyder, and Li Xiao Chowdhury Sayeed Hyder Department of Computer Science & Engineering Michigan State University Outline Background ◦ Cognitive Radio Network Channel Selection Problem Distributed Block Learning Algorithm ◦ Decision Period ◦ Channel Ranking ◦ Channel Switching Simulation Results ◦ Regret ◦ Switching cost wowmom 2012 2 Background Figure: Current Spectrum Allocation in US Figure: Underutilized Spectrum Ref: Akyildiz, I., W. Lee, M. Vuran, and S. Mohanty, “NeXt Generation/ Dynamic Spectrum Access/ Cognitive Radio Wireless Networks: A Survey”, Computer Networks 2006 wowmom 2012 3 Background Current Status ◦ Spectrum Scarcity ◦ Underutilized spectrum Cognitive radio (CR) ◦ Adapt its transmission and reception parameters (frequency, modulation rate, power etc.) Cognitive Radio Network ◦ Two types of user Primary user or licensed user (PU) Secondary user or opportunistic user (SU) ◦ Requirements SU cannot affect ongoing transmission of PUs Must vacant the spectrum if PU arrives wowmom 2012 4 Problem Statement Channel Selection Problem ◦ Unknown PU activity ◦ Time varying channel condition ◦ Channel switching is not free! Learning algorithm (exploration exploitation) Our goal is to design a distributed learning algorithm that minimizes regret, minimizes switching cost, and adapts to time varying channels. wowmom 2012 5 Problem Statement The expected regret following policy ρ^ R( ,t) S ( ,t) S ( ,t) ( ,t) Difference in reward between optimal channel selection and channel selection by any learning algorithm wowmom 2012 Switching regret 6 Problem Statement The expected reward following optimal U policy ρ * S ( ,t) t r i i 1 The expected reward following centralized policy ρcent S ( cent , t ) r C i [ i ( t )] i 1 The expected reward following distributed policy ρdist ^ C S ( dist , t ) r i 1 wowmom 2012 U [ i i, j ( t )] j 1 7 Problem Statement Switching regret ◦ # number of switching x unit switching cost ◦ Defined as the number of packets could have been transmitted within the time if it did not switch that channel. ◦ Unit switching cost switching delay = Estimated packet transmission time Ref: Y. Xiao and F. Hu, Cognitive Radio Networks, CRC press, 2008 wowmom 2012 8 Problem Statement The expected regret following centralized policy ρcent U C R ( cent , t ) r t i r i [ i ( t )] ( t ) c * i 1 i 1 The expected regret following distributed policy ρdist U R ( dist , t ) r t i r * i 1 wowmom 2012 C U i 1 i [ i , j ( t )] ( t ) c j 1 9 Distributed Block Learning Algorithm Formulate the channel selection problem as multi arm bandit problem with multiple play and switching cost. Present a distributed ‘block’ approach where each user selects channel independently ◦ ◦ ◦ ◦ Decision period (when) Channel Ranking (on what) Channel Switching (why) Channel Adaptation (how) wowmom 2012 10 Decision Period lb n f lf C 2 n 2 f Block and frame: ◦ Timeslots are arranged in blocks, blocks are in frames. ◦ Block length increases linearly, frame length increases exponentially with frame number ◦ All blocks in a frame are of equal length wowmom 2012 11 Channel Ranking Channel ranking based on ◦ Time average statistics What we already got from the channel ◦ Upper bound statistics What we expect from the channel wowmom 2012 12 Channel Switching Only one channel is compared with the current channel (round robin) at the decision period Channel switching rule ◦ If the candidate channel has higher expectation than the current one. ◦ If the current channel is not in the top rank wowmom 2012 13 Channel Adaptation Opportunity cost ◦ Increase the expectation of other channels if the idle rate of the current channel is not consistent with its overall idle rate. ◦ Increases the probability of switching wowmom 2012 14 Simulation NS2 Channels’ idle probability follows Bernoulli distribution Number of channels: 9 Number of users: 4-8 Time slots: 50000 Unit switching cost: 0.5 wowmom 2012 15 Results (Regret) DBLA outperforms RAND in terms of regret minimization Normalized Regret vs. time (with and without switching cost) ρrand: A. Anandkumar, N. Michael, and A.Tang. “Opportunistic Spectrum Access with Multiple Users: Learning Under Competition, INFOCOM 2010 wowmom 2012 16 Results (Scalability) In the case of RAND, regret increases exponentially while in the case of DBLA, Rate of change in regret is almost linear. wowmom 2012 17 Results (switching) Regret vs. switching cost # of Switching vs. # of users DBLA has much less regret and less number of switching compared to RAND wowmom 2012 18 Results (adaptability) • Channels idle probability changes at each 10000 slots wowmom 2012 19 Conclusion & Future Work Learning algorithm to rank channels which ◦ ◦ ◦ ◦ minimizes regret minimizes switching is scalable adapts to dynamic channel condition Future Work ◦ More realistic channel model ◦ Theoretical proof analysis for upper bound wowmom 2012 20 Questions ?