Reinforcement Learning Based Spectrum-aware Routing in Multihop Cognitive Radio Networks 指導教授 :Wei-Yeh Chen 學 生:張政偉 M. H. Wahab , Y. Yang and M. Sooriyabandara , “ Reinforcement Learning Based Spectrum-aware Routing in Multi-hop Cognitive Radio Networks ” CROWNCOM , Hannover, Germany , pp. 1 - 5, June 2009 2016/7/12 1 Introduction System Model Reinforcement Learning Based Spectrum-Aware Routing Algorithms Simulation Conclusion 2016/7/12 2 Introduction(1/2) Today's wireless networks are characterized by fixed spectrum assignment policies. The policy often leads to wasting large spectrum portions due to sporadic utilization of the licensed users. 2016/7/12 3 Introduction(2/2) Multi-hop Cognitive Radio is a novel solution to scarce spectrum resource problem. It enables unlicensed users (secondary users) to seek opportunities for transmission by exploiting the idle periods of licensed users (primary users). 2016/7/12 4 System Model Fig. 1 is multi-hop network topology ( OMNet++模 擬) 2016/7/12 5 System Model When node x want to communicate with its neighbouring node y : 1. MAC layer determines which channels are free as detected by node x’ s PHY layer 2. Node x sends Request to Send (RTS) packet on the first free channel C(1), i.e. RTS[C(1)] 2016/7/12 6 3. If after time τ, it doesn’t receive the Clear to S(CTS) packet, i.e. CTS[C(1)], x assumes that y cannot communicate on C(1) during that time slot 4. x sends RTS[C(2) etc.] to y and the process is repeated until x gets a CTS from y on the same channel the RTS was sent 5. y now knows to listen on the channel the CTS was sent on and communication can begin on that channel until the packet is transferred successfully 2016/7/12 7 RL Based Spectrum-Aware Routing Algorithms Routing Table of Q-Values Spectrum-aware Q-routing Spectrum-aware DRQ-routing 2016/7/12 8 Routing Table of Q-Values 此圖為Q-values 2016/7/12 9 此圖為傳送方式 2016/7/12 10 Spectrum-aware Q-routing 1. When node x receives a packet from node s destined to a node d , it sends the packet to the neighbour node y with the maximum Q value 2. Node x receives the feedback with maximum Q value of node y for destination d 3. Node x updates the Q value with the feedback 4. Node y repeats this circle if it is not the destination 2016/7/12 11 Spectrum-aware DRQ-routing 1. When node x receives a packet from node s destined to a node d, it sends the packet to the neighbour node y with the maximum Q value 2. Node y receives information with maximum Q value of node x for destination s 3. Node x receives the feedback with maximum Q value of node y for destination d 2016/7/12 12 4. Node x and y update the corresponding Q values in their routing tables with the received information 5. Node y repeats this circle if it is not the destination 2016/7/12 13 Simulation 採用三種協定 spectrum-aware shortest path protocol Q-routing spectrum-aware Q-routing 三種網路環境 2016/7/12 Low load: 0.5 – 1.5 packets/s Medium load: 1.75 – 2.25 packets/s High load: > 2.5 packets/s 14 Low load 2016/7/12 15 Medium load 2016/7/12 16 High load 2016/7/12 17 Conclusion 本文採取增強學習的方法去產出一個最佳化的 Q-Value,而要使用通道時就可從表中直接找 尋,遠比最短路徑來的有效率。 比較單一節點有Q-Value和每個節點有Q-Value, 會發現當要使用時,因為雙方都會提出最大值 Q,當雙方符合就可馬上連結,可以提高其效 率。 2016/7/12 18 附錄 Spectrum-aware Q-routing 2016/7/12 19 附錄 Spectrum-aware DRQ-routing 2016/7/12 20