ppt - Computer Science and Engineering

advertisement
DBLA: DISTRIBUTED BLOCK LEARNING
ALGORITHM FOR CHANNEL SELECTION
IN COGNITIVE RADIO NETWORKS
- Chowdhury Sayeed Hyder, and Li Xiao
Chowdhury Sayeed Hyder
Department of Computer Science & Engineering
Michigan State University
Outline

Background
◦ Cognitive Radio Network


Channel Selection Problem
Distributed Block Learning Algorithm
◦ Decision Period
◦ Channel Ranking
◦ Channel Switching

Simulation Results
◦ Regret
◦ Switching cost
wowmom 2012
2
Background
Figure: Current Spectrum Allocation in US
Figure: Underutilized Spectrum
Ref: Akyildiz, I., W. Lee, M. Vuran, and S. Mohanty, “NeXt Generation/ Dynamic Spectrum Access/
Cognitive Radio Wireless Networks: A Survey”, Computer Networks 2006
wowmom 2012
3
Background

Current Status
◦ Spectrum Scarcity
◦ Underutilized spectrum

Cognitive radio (CR)
◦ Adapt its transmission and reception parameters
(frequency, modulation rate, power etc.)

Cognitive Radio Network
◦ Two types of user
 Primary user or licensed user (PU)
 Secondary user or opportunistic user (SU)
◦ Requirements
 SU cannot affect ongoing transmission of PUs
 Must vacant the spectrum if PU arrives
wowmom 2012
4
Problem Statement

Channel Selection Problem
◦ Unknown PU activity
◦ Time varying channel condition
◦ Channel switching is not free!


Learning algorithm (exploration exploitation)
Our goal is to design a distributed learning
algorithm that minimizes regret, minimizes
switching cost, and adapts to time varying
channels.
wowmom 2012
5
Problem Statement
The expected regret following policy ρ^



R( ,t)  S ( ,t)  S ( ,t)  ( ,t)
Difference in reward between
optimal channel selection and
channel selection by any
learning algorithm
wowmom 2012
Switching regret
6
Problem Statement

The expected reward following optimal
U
policy ρ
*
S ( ,t)  t   r  i
i 1

The expected reward following centralized
policy ρcent
S (  cent , t )  r 
C

 i  [ i ( t )]
i 1

The expected reward following
distributed policy ρdist
^
C
S (  dist , t )  r  
i 1
wowmom 2012
U
  [ 
i
i, j
( t )]
j 1
7
Problem Statement

Switching regret
◦ # number of switching x unit switching cost
◦ Defined as the number of packets could have been
transmitted within the time if it did not switch that
channel.
◦ Unit switching cost
switching delay
=
Estimated packet transmission time
Ref: Y. Xiao and F. Hu, Cognitive Radio Networks, CRC press, 2008
wowmom 2012
8
Problem Statement
The expected regret following centralized policy ρcent
U
C
R (  cent , t )  r  t   i  r   i  [ i ( t )]   ( t )  c
*
i 1
i 1
The expected regret following distributed policy ρdist
U
R (  dist , t )  r  t   i  r 
*
i 1
wowmom 2012
C
U

i 1
i
 [  i , j ( t )]   ( t )  c
j 1
9
Distributed Block Learning Algorithm

Formulate the channel selection problem as
multi arm bandit problem with multiple play
and switching cost.

Present a distributed ‘block’ approach where
each user selects channel independently
◦
◦
◦
◦
Decision period (when)
Channel Ranking (on what)
Channel Switching (why)
Channel Adaptation (how)
wowmom 2012
10
Decision Period
lb  n f
lf  C  2

n
2
f
Block and frame:
◦ Timeslots are arranged in blocks, blocks are in
frames.
◦ Block length increases linearly, frame length
increases exponentially with frame number
◦ All blocks in a frame are of equal length
wowmom 2012
11
Channel Ranking

Channel ranking based on
◦ Time average statistics
 What we already got from the channel
◦ Upper bound statistics
 What we expect from the channel
wowmom 2012
12
Channel Switching


Only one channel is compared with the current
channel (round robin) at the decision period
Channel switching rule
◦ If the candidate channel has higher expectation than
the current one.
◦ If the current channel is not in the top rank
wowmom 2012
13
Channel Adaptation

Opportunity cost
◦ Increase the expectation of other channels if the idle
rate of the current channel is not consistent with its
overall idle rate.
◦ Increases the probability of switching
wowmom 2012
14
Simulation






NS2
Channels’ idle probability follows Bernoulli
distribution
Number of channels: 9
Number of users: 4-8
Time slots: 50000
Unit switching cost: 0.5
wowmom 2012
15
Results (Regret)
DBLA outperforms RAND in terms of regret minimization
Normalized Regret vs. time (with and without switching cost)
ρrand: A. Anandkumar, N. Michael, and A.Tang. “Opportunistic
Spectrum Access with Multiple Users: Learning Under Competition,
INFOCOM 2010
wowmom 2012
16
Results (Scalability)
In the case of RAND, regret increases exponentially while in the case of DBLA,
Rate of change in regret is almost linear.
wowmom 2012
17
Results (switching)
Regret vs. switching cost
# of Switching vs. # of users
DBLA has much less regret and less number of switching compared to RAND
wowmom 2012
18
Results (adaptability)
• Channels idle probability changes at each 10000 slots
wowmom 2012
19
Conclusion & Future Work

Learning algorithm to rank channels which
◦
◦
◦
◦

minimizes regret
minimizes switching
is scalable
adapts to dynamic channel condition
Future Work
◦ More realistic channel model
◦ Theoretical proof analysis for upper bound
wowmom 2012
20
Questions ?
Download