Proceedings of the 7th Annual ISC Graduate Research Symposium ISC-GRS 2013

advertisement
Proceedings of the 7th Annual ISC Graduate Research Symposium
ISC-GRS 2013
April 24, 2013, Rolla, Missouri
COMPARATIVE ANALYSIS OF HARDWARE ACCELERATED COGNITIVE
NETWORKS
Nathan Price
Department of Electrical and Computer Engineering
Missouri University of Science and Technology, Rolla, MO 65409
ABSTRACT
Wireless digital communication systems have long faced
the problem of optimum bandwidth utilization in multi-user
environments. Cognitive radio networks aim at addressing this
problem through intelligent, flexible radio systems. Such
adaptable radio systems should resolve tradeoffs between user
demanded quality of service (QoS) and spectrum conditions. In
the past, the cognitive radio systems implemented most of the
signal processing and networking protocols in software. Such
solutions relay on general purpose processors to implement
changing communication schemes at a cost of reduced
throughput, increased delays, and lack of guarantee of
minimum QoS. On the other hand, many dedicated systems
implement communication schemes in hardware using onetime-programmable chips. Recent works consider a dynamic
environment where certain functions and schemes can be either
implemented in hardware on FPGA or in software depending
on required network performance and available hardware
resources. This work investigates the design challenges of
blending hardware and software in an efficient manner across
two different hierarchical levels of design.
simple transceiver feeding ADCs that then connects to a general
purpose processor (GPP). Entire radio systems can be created
and modified in software without the need for a purpose-built
hardware. SDRs usually consist of a real-time software
framework upon which libraries of dedicated radio objects are
built. These objects are then linked together to form a radio
system. However, the drawback of such flexible, softwarecentered solution is its computational overhead and often
performance uncertainty. These systems should run in realtime, that is each discrete time radio sample have to be
processed and handled at rate greater than or equal to the
sampling rate of the system. Consequently, the processing
overhead quickly increases with demand. Using processors
with higher computational power often leads to increased
power requirements. This is especially detrimental in mobile,
battery-operated platforms, which puts a limit on available
processing power. This helps to explain why the SDRs have
yet to become commercially viable in a consumer market.
Cognitive
Engine
FPGA
TX/RX
1. INTRODUCTION
Cognitive radio (CR) systems are digital radio networks
with an intelligent control system for channel access. This
control system or “cognitive engine” can be viewed as a smart
resource dispatcher. It checks spectrum conditions and makes
decisions about where and how to transmit [1]. For example, in
a busy band of radio spectrum it may make a decision to use a
narrow bandwidth modulation to avoid interference with
neighbors. Alternatively, it may chose to transmit only on free
channels rather than occupied ones, or it may decide to user
another band entirely. Furthermore, a CR system should
continuously adapt to changing environmental conditions. For
example, when a high noise level reduces the SNR, then a CR
system might try to adapt by either increasing transmit power,
increasing forward error correction, and changing modulation
techniques; or do a combination of those techniques. However,
such wide range of adaptability is a challenge for conventional
radio electronics.
Typical cognitive radio platform employs an FPGA-based
software-defined radio (SDR). The SDR is simply a radio
system in which most of the digital signal processing is
accomplished in software, as shown in Figure 1. Conventional
radio components, including filters and modulators, are
implemented using discrete-time signal processing techniques
in software [2]. An SDR’s data path is usually comprised of a
System Bus
General
Purpose
Processor
Figure 1 - Typical architecture model for a cognitive radio.
A)
B)
TX/RX
ADC/DAC
General
Purpose
Processor
TX/RX
ADC/DAC
FPGA
General
Purpose
Processor
Figure 2 - A) Typical software defined radio datapath. B) Software
defined radio datapath with the addition of an FPGA.
Many SDR designs are equipped with a field
programmable gate arrays (FPGAs) as shown in Figure 2.B.
The intention is to implement the common, fixed processing
steps in hardware for improved performance while maintaining
the flexibility of reprogramming the FPGA when improved
algorithm is developed. The reconfigurable nature of FPGAs
allows for purpose built dedicated hardware to be quickly
fabricated to meet the demands of a modern communication
schemes. Moreover, the FPGAs meet the high computational
1
power requirements of high speed, digital communication since
they can be configured as dedicated single function hardware
that runs orders of magnitude faster and more efficiently than a
GPP. However, the existing works employ the FPGA to
implement only the static functions (configuration).. The
proposed work explores the idea of a on-the-fly reprograming
of FPGA to adopt to varying network demands and channel
conditions.
problem. In general, the decision-making would consider
tradeoffs between performance and resource optimization. A
PR module may be built in a very high performance, parallel
manner and consume much of an FPGA’s available resources,
or it may be built in a more resource conservative, serial
fashion but run slower.
2. CURRENT RESEARCH
In this section, we consider a high-level optimization of
hardware-software architecture. In traditional CR approaches
the FPGA is programmed with the as many of required radio
modules as the FPGA resources allow. While [4] proposes
novel methods of swapping a DSSS block in and out on a
receiver, the authors do not consider the case in which all
required radio modules simply will not fit on the FPGA despite
leveraging PR. In such overflow cases, the CR would have to
fall back to a hybrid system based partially in software and
partially in hardware. Resource intensive operations including
down-conversion and filtering should be performed on the
FPGA while less intensive operations can be performed on the
GPP. As demand is reduced on the system, the overflow
processes can be shifted back to the FPGA.
Additionally, cursory objectives of the CR, for example
energy consumption, often have to be included in consideration.
In the previously mentioned scenario, the end goal is obviously
overall performance, which may due for a CR serving as
network controller, but in a mobile device minimal energy
consumption becomes an important optimization objective.
There may be a more optimal configuration in which the
mobile device’s GPP performs most or all of the computation
while the FPGA is nearly or completely idle and powered
down.
New FPGA technology holds the potential for extremely
flexible yet efficient CR hardware. Newer FPGAs have begun
to offer partial reconfiguration (PR) meaning the FPGA’s
design can be altered in real-time through the use of modular
design blocks [3]. Similar to how SDRs have software libraries
of radio components that can be called in software as required,
a PR FPGA can have PR modules stored in a local memory that
can deployed on chip as demanded by the system [3,4]. While
this design is technically still “software defined” it is
implemented on a faster and more efficient hardware.
GPP
FPGA
Mod. B
Module
D
Bus
Object
A
Module
C
Module
E
Figure 3 - Cognitive radio datapath with a load distributed across a
general-purpose processor (GPP) and an FPGA.
The combination of both traditional SDRs and PR FPGAs
to handle heavy processing is a new and promising topic in
digital communications. The general overview is shown in
Figure 3 where various modules are implemented either on
FPGA or on GPP. VA tech’s CR research [1] has looked at the
challenges of both CR systems on the whole as well as diving
into the fine details of on-the-fly reconfiguration of FPGAs.
These fine details include placement of modules and
interconnect routing. The IEEE has also published its P1900.4
draft specification for a CR network. Even this early spec calls
for the use of PR FPGAs [5]. [4] looks at the state of the FPGA
industry for partial reconfiguration by exploring Xilinx’s design
tools and hardware in a few different CR scenarios.
Spanning multiple levels of a CR system is the issue of
performance and resource allocation. High-level architecture
of CRs consist of the cognitive engine, processing hardware,
transceivers, and the links between these components. CR
systems may consist of one or multiple instances of each
component depending on the network state and application
requirements.
In contrast, a low-level design decisions
involving PR modules and SDR objects would allow higher
flexibility at a cost of increased complexity of the optimization
3. HIGH-LEVEL ARCHITECTURE DECISIONS
GPP to FPGA
Object
A
Object
C
Object
B
System Bus
GPP to FPGA
Object
A
Object
C
Object
E
System Bus
Module
D
Module
B
Module
D
Module
E
Figure 4 - Left) A cognitive radio architecture splitting workloads
across the system bus where layer hierarchy is preserved. Right)
A cognitive radio architecture splitting workloads where layer
hierarchy is not preserved. Notice the difference in bus
transitions.
2
Code
generator
Another component in CR architecture that cannot be
overlooked is the system bus. CR systems will contain some
version of a networking stack. Each layer and/or sub layer of
this stack will be instantiated as either an SDR object or a PR
module. Depending on the nature of layer, it may run more
efficiently on the GPP or on the FPGA. It is possible that
processing efficiency will not follow the network layer
hierarchy, that is lower layers may run better on the GDP while
higher layers may run better on the FPGA. In such a case, it is
important to consider both the speed and bandwidth between
the both processing units. The demand on an interconnect bus
increases with number of layer transpositions thus resulting in a
communication bottleneck via the bus.
Corr.
∫
g(t-Tc/2)
m(t)g(t)c(t)
Corr.
∫
.
.
.
.
.
.
Threshold
Detector
m(t)c(t)
g(t+Tc/2)
Corr.
∫
Figure 6 - Parallel search DSS decoder
Mod. A
m(t)c(t)
Bus
Mod. B
Module
C
m(t)g(t)c(t)
Figure 5 - Cognitive radio architecture featuring a small footprint
FPGA (left) where the usable space has been exhausted and a
large FPGA (right).
Corr.
∫
Code
generator
Threshold
Detector
Control
Figure 7 - Serial search DSSS decoder
It is also important to consider environments in which
there are mixed resources. Take for example, a CR containing a
small, energy-efficient FPGA and a large, high performance
FPGA, as shown in Figure 5. If the goals of the system were to
reduce power consumption, a CR may favor the energyefficient FPGA, but will need to use the high-performance
FPGA when demands are high or large footprint modules are
required. In the event that the CR uses both FPGAs, it must
strike a balance among three variables: module placement,
interconnect bandwidth, power consumption.
The second design is a complete serial search based direct
sequence encoder, as shown in Figure 7. This decoder requires
one correlator and one integrator. It operates by time shifting
the spreading sequence steps at a time until it acquires a lock.
Maximum acquisition time will suffer as a result. Maximum
acquisition time is now equal to twice the code length times a
per-bit checking time [6].
Primary
g(t)
4. LOW-LEVEL ARCHITECTURE DECISIONS
Optimizations may also be made in lower levels of design.
Consider a direct-sequence spread spectrum decoder (DSSS)
[4]. There are three standard design approaches to the DSSS
decoder: serial synchronization, parallel synchronization, and
delay locked loop [6, 7]. Each design requires differing
amounts of resources and features different acquisition times.
The first design to be considered is the direct sequence
parallel search decoder as shown in Figure 6. This model is
usually held up as the trivial case for DSSS decoders since it is
usually impractical to construct due to hardware cost, but in the
case of a CR with a PR FPGA, is not out of the realm of
possibility. The direct sequence parallel search decoder requires
two correlators and two integrators for every chip in the
spreading code sequence. It has been shown in [4] that once
synchronization has occurred the CR is free to remove the
DSSS. The trade-off for all of this hardware expense is a DSSS
decoder capable sequence acquisition with nearly instant
acquisition time [6, 7].
Corr.
m(t)c(t)
Corr.
∫
m(t)g(t)c(t)
Code
generator
“Early”
g(t+Tc/2)
Corr.
Comparator
∫
“Late”
g(t-Tc/2)
Figure 8 - Delay-locked loop DSSS decoder
The final design is an intermediate between the two
extremes. It is referred to as a delay-locked loop and is shown
3
in Figure 8. It requires three correlators, three integrators, and
one threshold detector.
The delay locked operates by
correlating with the chip sequence one half chip time behind
and ahead of the primary correlator. The results of these two
correlations are compared. The result is a time delay that is
used to advance or retard the chip sequence until a lock is
found. The maximum acquisition time is theoretically halved
since in effect the sequence is being searched from both ends
[6, 7].
When designing a PR module for a CR, it is important to
optimize to carefully consider the design of that module. The
previous discussion of DSSS decoder design was to illustrate
that there is often more than one way to build a functional unit.
All three of these designs are valid choices, and the decision
depends on many variables. With a PR FPGA there may value
in implementing all three designs should conditions favor a
particular method.
[4]
[5]
7. CONCLUSIONS AND FUTURE WORK
Analyzed in this paper were different considerations for
CR datapath development and resource sharing. Once the
hardware datapath is known for a particular CR design, it is up
to the designer to maximize flexibility and performance given
limited resources. This can be achieved by optimizing: (a) the
placement of functional design units, and (b) the redesign of the
functional blocks themselves to meet dynamic requirements.
Moreover, the limited resource must be carefully
considered in the optimization. In a dynamic system, designers
must consider using different modules at different times.
Future work will explore a more quantitative approach to
evaluating module design and placement and develop suitable
benchmarking techniques.
[6]
[7]
8. ACKNOWLEDGMENTS
The author would like to acknowledge the MS&T Intelligent
Systems Center for its financial support and the support of his
advisor for continued guidance.
9. REFERENCES
[1]
[2]
[3]
MacKenzie, Allen B., et al, 2009, "Cognitive Radio
and Networking Research at Virginia Tech,"
Proceedings of the IEEE , vol.97, no.4, pp.660,688,
April 2009 doi: 10.1109/JPROC.2009.2013022 URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnum
ber=4812028&isnumber=4814763
GNU
Radio
Project
http://gnuradio.org/redmine/projects/gnuradio/wiki/Wi
kiStart
McDonald,
E.J.,
"Runtime
FPGA
partial
reconfiguration," Aerospace and Electronic Systems
Magazine, IEEE , vol.23, no.7, pp.10,15, July 2008
4
doi:
10.1109/MAES.2008.4579286
URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnum
ber=4579286&isnumber=4579278
McDonald, E.J.; Schlossberg, N.W.; Grayver, E.,
"Hardware
accelerated
multichannel
receiver,"
Aerospace conference, 2009 IEEE , vol., no., pp.1,7, 714 March 2009, doi: 10.1109/AERO.2009.4839418,
URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnum
ber=4839418&isnumber=4839294
Filin, S.; Ishizu, K.; Harada, H., "IEEE draft standard
P1900.4a for architecture and interfaces for dynamic
spectrum access networks in white space frequency
bands: Technical overview and feasibility study,"
Personal, Indoor and Mobile Radio Communications
Workshops (PIMRC Workshops), 2010 IEEE 21st
International Symposium on , vol., no., pp.15,20, 2630 Sept. 2010
doi:
10.1109/PIMRCW.2010.5670353,
URL:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnum
ber=5670353&isnumber=5670350
Meel, ir. J., 1999, “Spread Spectrum (SS)
introduction,” Sirius Communications—RotselaarBelgium.
Skylar, Bernard, 2001, Digital Communications:
Fundamentals and Applications, Prentice-Hall, Inc.,
Upper Saddle River, New Jersey, Chap 12.
Download