Proceedings of the 7th Annual ISC Graduate Research Symposium ISC-GRS 2013 April 24, 2013, Rolla, Missouri COMPARATIVE ANALYSIS OF HARDWARE ACCELERATED COGNITIVE NETWORKS Nathan Price Department of Electrical and Computer Engineering Missouri University of Science and Technology, Rolla, MO 65409 ABSTRACT Wireless digital communication systems have long faced the problem of optimum bandwidth utilization in multi-user environments. Cognitive radio networks aim at addressing this problem through intelligent, flexible radio systems. Such adaptable radio systems should resolve tradeoffs between user demanded quality of service (QoS) and spectrum conditions. In the past, the cognitive radio systems implemented most of the signal processing and networking protocols in software. Such solutions relay on general purpose processors to implement changing communication schemes at a cost of reduced throughput, increased delays, and lack of guarantee of minimum QoS. On the other hand, many dedicated systems implement communication schemes in hardware using onetime-programmable chips. Recent works consider a dynamic environment where certain functions and schemes can be either implemented in hardware on FPGA or in software depending on required network performance and available hardware resources. This work investigates the design challenges of blending hardware and software in an efficient manner across two different hierarchical levels of design. simple transceiver feeding ADCs that then connects to a general purpose processor (GPP). Entire radio systems can be created and modified in software without the need for a purpose-built hardware. SDRs usually consist of a real-time software framework upon which libraries of dedicated radio objects are built. These objects are then linked together to form a radio system. However, the drawback of such flexible, softwarecentered solution is its computational overhead and often performance uncertainty. These systems should run in realtime, that is each discrete time radio sample have to be processed and handled at rate greater than or equal to the sampling rate of the system. Consequently, the processing overhead quickly increases with demand. Using processors with higher computational power often leads to increased power requirements. This is especially detrimental in mobile, battery-operated platforms, which puts a limit on available processing power. This helps to explain why the SDRs have yet to become commercially viable in a consumer market. Cognitive Engine FPGA TX/RX 1. INTRODUCTION Cognitive radio (CR) systems are digital radio networks with an intelligent control system for channel access. This control system or “cognitive engine” can be viewed as a smart resource dispatcher. It checks spectrum conditions and makes decisions about where and how to transmit [1]. For example, in a busy band of radio spectrum it may make a decision to use a narrow bandwidth modulation to avoid interference with neighbors. Alternatively, it may chose to transmit only on free channels rather than occupied ones, or it may decide to user another band entirely. Furthermore, a CR system should continuously adapt to changing environmental conditions. For example, when a high noise level reduces the SNR, then a CR system might try to adapt by either increasing transmit power, increasing forward error correction, and changing modulation techniques; or do a combination of those techniques. However, such wide range of adaptability is a challenge for conventional radio electronics. Typical cognitive radio platform employs an FPGA-based software-defined radio (SDR). The SDR is simply a radio system in which most of the digital signal processing is accomplished in software, as shown in Figure 1. Conventional radio components, including filters and modulators, are implemented using discrete-time signal processing techniques in software [2]. An SDR’s data path is usually comprised of a System Bus General Purpose Processor Figure 1 - Typical architecture model for a cognitive radio. A) B) TX/RX ADC/DAC General Purpose Processor TX/RX ADC/DAC FPGA General Purpose Processor Figure 2 - A) Typical software defined radio datapath. B) Software defined radio datapath with the addition of an FPGA. Many SDR designs are equipped with a field programmable gate arrays (FPGAs) as shown in Figure 2.B. The intention is to implement the common, fixed processing steps in hardware for improved performance while maintaining the flexibility of reprogramming the FPGA when improved algorithm is developed. The reconfigurable nature of FPGAs allows for purpose built dedicated hardware to be quickly fabricated to meet the demands of a modern communication schemes. Moreover, the FPGAs meet the high computational 1 power requirements of high speed, digital communication since they can be configured as dedicated single function hardware that runs orders of magnitude faster and more efficiently than a GPP. However, the existing works employ the FPGA to implement only the static functions (configuration).. The proposed work explores the idea of a on-the-fly reprograming of FPGA to adopt to varying network demands and channel conditions. problem. In general, the decision-making would consider tradeoffs between performance and resource optimization. A PR module may be built in a very high performance, parallel manner and consume much of an FPGA’s available resources, or it may be built in a more resource conservative, serial fashion but run slower. 2. CURRENT RESEARCH In this section, we consider a high-level optimization of hardware-software architecture. In traditional CR approaches the FPGA is programmed with the as many of required radio modules as the FPGA resources allow. While [4] proposes novel methods of swapping a DSSS block in and out on a receiver, the authors do not consider the case in which all required radio modules simply will not fit on the FPGA despite leveraging PR. In such overflow cases, the CR would have to fall back to a hybrid system based partially in software and partially in hardware. Resource intensive operations including down-conversion and filtering should be performed on the FPGA while less intensive operations can be performed on the GPP. As demand is reduced on the system, the overflow processes can be shifted back to the FPGA. Additionally, cursory objectives of the CR, for example energy consumption, often have to be included in consideration. In the previously mentioned scenario, the end goal is obviously overall performance, which may due for a CR serving as network controller, but in a mobile device minimal energy consumption becomes an important optimization objective. There may be a more optimal configuration in which the mobile device’s GPP performs most or all of the computation while the FPGA is nearly or completely idle and powered down. New FPGA technology holds the potential for extremely flexible yet efficient CR hardware. Newer FPGAs have begun to offer partial reconfiguration (PR) meaning the FPGA’s design can be altered in real-time through the use of modular design blocks [3]. Similar to how SDRs have software libraries of radio components that can be called in software as required, a PR FPGA can have PR modules stored in a local memory that can deployed on chip as demanded by the system [3,4]. While this design is technically still “software defined” it is implemented on a faster and more efficient hardware. GPP FPGA Mod. B Module D Bus Object A Module C Module E Figure 3 - Cognitive radio datapath with a load distributed across a general-purpose processor (GPP) and an FPGA. The combination of both traditional SDRs and PR FPGAs to handle heavy processing is a new and promising topic in digital communications. The general overview is shown in Figure 3 where various modules are implemented either on FPGA or on GPP. VA tech’s CR research [1] has looked at the challenges of both CR systems on the whole as well as diving into the fine details of on-the-fly reconfiguration of FPGAs. These fine details include placement of modules and interconnect routing. The IEEE has also published its P1900.4 draft specification for a CR network. Even this early spec calls for the use of PR FPGAs [5]. [4] looks at the state of the FPGA industry for partial reconfiguration by exploring Xilinx’s design tools and hardware in a few different CR scenarios. Spanning multiple levels of a CR system is the issue of performance and resource allocation. High-level architecture of CRs consist of the cognitive engine, processing hardware, transceivers, and the links between these components. CR systems may consist of one or multiple instances of each component depending on the network state and application requirements. In contrast, a low-level design decisions involving PR modules and SDR objects would allow higher flexibility at a cost of increased complexity of the optimization 3. HIGH-LEVEL ARCHITECTURE DECISIONS GPP to FPGA Object A Object C Object B System Bus GPP to FPGA Object A Object C Object E System Bus Module D Module B Module D Module E Figure 4 - Left) A cognitive radio architecture splitting workloads across the system bus where layer hierarchy is preserved. Right) A cognitive radio architecture splitting workloads where layer hierarchy is not preserved. Notice the difference in bus transitions. 2 Code generator Another component in CR architecture that cannot be overlooked is the system bus. CR systems will contain some version of a networking stack. Each layer and/or sub layer of this stack will be instantiated as either an SDR object or a PR module. Depending on the nature of layer, it may run more efficiently on the GPP or on the FPGA. It is possible that processing efficiency will not follow the network layer hierarchy, that is lower layers may run better on the GDP while higher layers may run better on the FPGA. In such a case, it is important to consider both the speed and bandwidth between the both processing units. The demand on an interconnect bus increases with number of layer transpositions thus resulting in a communication bottleneck via the bus. Corr. ∫ g(t-Tc/2) m(t)g(t)c(t) Corr. ∫ . . . . . . Threshold Detector m(t)c(t) g(t+Tc/2) Corr. ∫ Figure 6 - Parallel search DSS decoder Mod. A m(t)c(t) Bus Mod. B Module C m(t)g(t)c(t) Figure 5 - Cognitive radio architecture featuring a small footprint FPGA (left) where the usable space has been exhausted and a large FPGA (right). Corr. ∫ Code generator Threshold Detector Control Figure 7 - Serial search DSSS decoder It is also important to consider environments in which there are mixed resources. Take for example, a CR containing a small, energy-efficient FPGA and a large, high performance FPGA, as shown in Figure 5. If the goals of the system were to reduce power consumption, a CR may favor the energyefficient FPGA, but will need to use the high-performance FPGA when demands are high or large footprint modules are required. In the event that the CR uses both FPGAs, it must strike a balance among three variables: module placement, interconnect bandwidth, power consumption. The second design is a complete serial search based direct sequence encoder, as shown in Figure 7. This decoder requires one correlator and one integrator. It operates by time shifting the spreading sequence steps at a time until it acquires a lock. Maximum acquisition time will suffer as a result. Maximum acquisition time is now equal to twice the code length times a per-bit checking time [6]. Primary g(t) 4. LOW-LEVEL ARCHITECTURE DECISIONS Optimizations may also be made in lower levels of design. Consider a direct-sequence spread spectrum decoder (DSSS) [4]. There are three standard design approaches to the DSSS decoder: serial synchronization, parallel synchronization, and delay locked loop [6, 7]. Each design requires differing amounts of resources and features different acquisition times. The first design to be considered is the direct sequence parallel search decoder as shown in Figure 6. This model is usually held up as the trivial case for DSSS decoders since it is usually impractical to construct due to hardware cost, but in the case of a CR with a PR FPGA, is not out of the realm of possibility. The direct sequence parallel search decoder requires two correlators and two integrators for every chip in the spreading code sequence. It has been shown in [4] that once synchronization has occurred the CR is free to remove the DSSS. The trade-off for all of this hardware expense is a DSSS decoder capable sequence acquisition with nearly instant acquisition time [6, 7]. Corr. m(t)c(t) Corr. ∫ m(t)g(t)c(t) Code generator “Early” g(t+Tc/2) Corr. Comparator ∫ “Late” g(t-Tc/2) Figure 8 - Delay-locked loop DSSS decoder The final design is an intermediate between the two extremes. It is referred to as a delay-locked loop and is shown 3 in Figure 8. It requires three correlators, three integrators, and one threshold detector. The delay locked operates by correlating with the chip sequence one half chip time behind and ahead of the primary correlator. The results of these two correlations are compared. The result is a time delay that is used to advance or retard the chip sequence until a lock is found. The maximum acquisition time is theoretically halved since in effect the sequence is being searched from both ends [6, 7]. When designing a PR module for a CR, it is important to optimize to carefully consider the design of that module. The previous discussion of DSSS decoder design was to illustrate that there is often more than one way to build a functional unit. All three of these designs are valid choices, and the decision depends on many variables. With a PR FPGA there may value in implementing all three designs should conditions favor a particular method. [4] [5] 7. CONCLUSIONS AND FUTURE WORK Analyzed in this paper were different considerations for CR datapath development and resource sharing. Once the hardware datapath is known for a particular CR design, it is up to the designer to maximize flexibility and performance given limited resources. This can be achieved by optimizing: (a) the placement of functional design units, and (b) the redesign of the functional blocks themselves to meet dynamic requirements. Moreover, the limited resource must be carefully considered in the optimization. In a dynamic system, designers must consider using different modules at different times. Future work will explore a more quantitative approach to evaluating module design and placement and develop suitable benchmarking techniques. [6] [7] 8. ACKNOWLEDGMENTS The author would like to acknowledge the MS&T Intelligent Systems Center for its financial support and the support of his advisor for continued guidance. 9. REFERENCES [1] [2] [3] MacKenzie, Allen B., et al, 2009, "Cognitive Radio and Networking Research at Virginia Tech," Proceedings of the IEEE , vol.97, no.4, pp.660,688, April 2009 doi: 10.1109/JPROC.2009.2013022 URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnum ber=4812028&isnumber=4814763 GNU Radio Project http://gnuradio.org/redmine/projects/gnuradio/wiki/Wi kiStart McDonald, E.J., "Runtime FPGA partial reconfiguration," Aerospace and Electronic Systems Magazine, IEEE , vol.23, no.7, pp.10,15, July 2008 4 doi: 10.1109/MAES.2008.4579286 URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnum ber=4579286&isnumber=4579278 McDonald, E.J.; Schlossberg, N.W.; Grayver, E., "Hardware accelerated multichannel receiver," Aerospace conference, 2009 IEEE , vol., no., pp.1,7, 714 March 2009, doi: 10.1109/AERO.2009.4839418, URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnum ber=4839418&isnumber=4839294 Filin, S.; Ishizu, K.; Harada, H., "IEEE draft standard P1900.4a for architecture and interfaces for dynamic spectrum access networks in white space frequency bands: Technical overview and feasibility study," Personal, Indoor and Mobile Radio Communications Workshops (PIMRC Workshops), 2010 IEEE 21st International Symposium on , vol., no., pp.15,20, 2630 Sept. 2010 doi: 10.1109/PIMRCW.2010.5670353, URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnum ber=5670353&isnumber=5670350 Meel, ir. J., 1999, “Spread Spectrum (SS) introduction,” Sirius Communications—RotselaarBelgium. Skylar, Bernard, 2001, Digital Communications: Fundamentals and Applications, Prentice-Hall, Inc., Upper Saddle River, New Jersey, Chap 12.