Bridging the Energy Gap in Size, Weight and Power Constrained Software Defined Radio: Agile Baseband Processing as a Key Enabler Bruno Bougard, Min Li, David Novo, Liesbet Van der Perre and Francky Catthoor The number of standards to implement in a single handset increases dramatically Driving Mobility UMTS 3G LTE Stationary Walking GSM GPRS HSxPA EDGE IEEE 802.16e IEEE 802.16a,d WLAN WLAN (IEEE (IEEE 802.11b) 802.11a/g/n) DECT BlueTooth 0.1 1 10 100 Mbps Data rate Bruno Bougard et al. Athens, May 2008 2 All cost factors direct towards high-volume programmable solutions everywhere possible [source: ICERA] Bruno Bougard et al. Athens, May 2008 3 Two barriers remain LTE? MIMO 3G+ .11n 3G .11g 2G The energy gap .11b The exploding complexity Bruno Bougard et al. Athens, May 2008 4 Most SWPC SDR research focuses on more energy efficient processor architectures • • ASICs Efficiency VLIW/DSPs ASIPs FPGAs ? VLIW/DSPs ASIPs FPGAs RISCs • • • • • • • • NXP onDSP, EVP Sandbridge Sandblaster SB3011 SiliconHive CSP2200 Infineon MUSIC Icera DXP Nokia VectorASIP UMich SODA ULinkoping/CORESONI CS BBE2 TUDresden SAMIRA … GPPs Flexibility Bruno Bougard et al. Athens, May 2008 5 Radio Baseband Platform Requirements Low Cost Long HW lifespan Short SW deployment time Scalable HW/SW Energy aware HW Energy aware algorithms Energy aware protocols Techno-aware power managnt Energy Aware Versatile RX digital front end Versatile TX digital front end Powerful MAC/RLC/QoS Ctrl Spectrum Agile Bruno Bougard et al. Athens, May 2008 6 Outline • IMEC SDR Baseband Platform • Wanted: Platform Aware Signal Processing • Case study 1: OFDMA transmitter • Case study 2: OFDMA receiver • Case study 3: Dynamic fixed-point format assignment Bruno Bougard et al. Athens, May 2008 7 Outline • IMEC SDR Baseband Platform • Wanted: Platform Aware Signal Processing • Case study 1: OFDMA transmitter • Case study 2: OFDMA receiver • Case study 3: Dynamic fixed-point format assignment Bruno Bougard et al. Athens, May 2008 8 Diversity/Versatility Where do you need flexibility? Where do you need energy efficiency? Modulation Demodulation (Inner Modem) Synchronization Forward Error Correction (Outer Modem) FE steering Signal detection Duty Cycle Bruno Bougard et al. Athens, May 2008 9 Need in flexibility Where do you need flexibility? Where do you need energy efficiency? Modulation Demodulation (Inner Modem) Synchronization Forward Error Correction (Outer Modem) FE steering Signal detection Need in energy efficiency Bruno Bougard et al. Athens, May 2008 10 IMEC MIMO-capable SDR baseband platform 802.11n 802.16e and next gen. and next gen. 3GPP LTE DVB-H/T • Up to 3 antennas • Up to 200Mbps • <500mW Flexible platform Bruno Bougard et al. Athens, May 2008 11 Two Programmable CGA Processor Cores at its heart 2.55 mm CGA Config mem 2.27 mm I$ Core logic (including registerfiles) • • • • AHB L1 scratchpad CGA Config mem 32KB I$ 128KB IMEM 128-entries CMEM 64KB L1 data scratchpad • TSMC 90G • Dual VT and substrate biasing for leakage reduction in sleep mode • Clock rate 400MHz WCC • Total Area: 6 sqmm • Power consumption • 4x4 64-bit 4-way SIMD CGA • VLIW and CGA mode of operations • C-programmable – Active TC VLIW 75mW – Active TC CGA 300mW – Leakage @ T=65C 25mW • 25 (theoretical) GOPS • 46MOPS/mW – Leakage in standby <10mW Bruno Bougard et al. Athens, May 2008 14 200 Mbps+ SDR application driver • IEEE 802.11 n digital inner modem receiver # ant. mod. scheme cod. rate SNR [dB] -3 BER = 10 1 BPSK 1/2 3.0 1 QPSK 1/2 6.5 1 16QAM 1/2 12.5 – Channel bonding 40MHz 1 64QAM 3/4 22.3 – 2 antennas MIMO SDM OFDM 2 BPSK 1/2 5.5 2 QPSK 1/2 11.5 2 16QAM 1/2 18.0 2 64QAM 3/4 34.0 Bruno Bougard et al. Athens, May 2008 15 Profiling for SDR benchmarks and OFDM full application prove real time operations @100Mbps 2-antenna SDM-OFDM @100Mbps Total per symbol processing Total preample processing QAM demap tracking SDM MMSE (2x) freq offset comp freq offset estim. fft (2x) xcorr fshift acorr 0 2 4 6 8 10 12 14 16 18 execution time @ 400MHz (Us) Bruno Bougard et al. Athens, May 2008 17 Great benefit in area but power higher than dedicated hardware solutions 4 350 3.5 3 2.5 SDR (IMEC) 400 SDR (IMEC) Reconf. (Intel) ASIC (Atheros) 300 ASIC (source: Intel) 250 200 150 100 2 50 1.5 VLIW ctrl 0% 0 FU VLIW 4% 1 FU CGA 25% CGA intercon - mux - pipeline 38% 0.5 0 802.11n 802.16e DVB-H 11n&16e all VLIW reg 6% Active Power VLIW: 75mW Active Power CGA: 300mW Leakage Power: 25mW peripherals 1% I$ 1% CGA reg 2% CMEM 13% DMEM 10% Bruno Bougard et al. Athens, May 2008 18 The interconnection network dominates the power consumption in VLIW and CGA modes VLIW mode CGA mode VLIW ctrl 0% VLIW ctrl 0% Interconnect + mux 28% FU VLIW 4% FU VLIW 22% FU CGA 25% CGA intercon - mux - pipeline FU CGA 38% 2% peripherals 2% VLIW reg 21% I$ 10% DMEM 13% CGA reg 2% CMEM 0% VLIW reg 6% peripherals 1% I$ 1% CGA reg 2% CMEM 13% DMEM 10% Active power: 75mW Leakage Power: 25mW Active power: 300mW Leakage Power: 25mW Bruno Bougard et al. Athens, May 2008 19 Outline • IMEC SDR Baseband Platform • Wanted: Platform Aware Signal Processing • Case study 1: OFDMA transmitter • Case study 2: OFDMA receiver • Case study 3: Dynamic fixed-point format assignment Bruno Bougard et al. Athens, May 2008 20 Wanted: SDR-Platform Aware Signal Processing Elephant as Platform Horse as Platform Bruno Bougard et al. Athens, May 2008 21 Dynamic signal processing implementation 3GPP Channel response Cycle Count on SoA processor Time Bruno Bougard et al. Athens, May 2008 22 Wanted: SDR-Platform Aware Signal Processing ASIC as platform • Requires simple control flow SDR as platform • Maximum functional reuse is a must • Accommodates more complex control flows • Accommodates complex and irregular computation structures • Functional reuse not a must (reuse memory footprint only) • Minimum data wordwidth • Accommodates high computation loads • Highest energy efficiency • Aligned data wordwidth • Limited maximum computation load • Lower energy efficiency • Requires manifest and regular computation structures Bruno Bougard et al. Athens, May 2008 23 Algorithm-Architecture Co-Design • Make algorithm compatible with architecture/compiler constraints • Exploit opportunities of programmable architecture Bruno Bougard et al. Athens, May 2008 24 Observation Channel Channel • Wireless baseband processing implies high dynamics • Wireless baseband processing tolerate inaccuracy • This is already considered at system level (X-layer), but what about in the signal processing implementation? Bruno Bougard et al. Athens, May 2008 25 The opportunity • Two viewpoints toward complexity – Computation complexity and memorySDR complexity Baseband High Structure – Structure complexity (control flow,with heterogeneity , etc.) Complexity • Wireless system can cope with inaccuracy (“scalable” QoS) • On SDR – Computation complexity is much more costly than in ASIC – Memory complexity is as costly as in ASIC – Structure complexity is much less costly than in ASIC • What can we do ? Baseband ASIC Increase the structure complexity of baseband processing to reduce with Low Structure the average computation Complexity and memory complexity by enabling run-time adaptation of the algorithms implementation to the dynamics in QoS requirement, environment (and platform) Bruno Bougard et al. Athens, May 2008 26 Outline • IMEC SDR Baseband Platform • Wanted: Platform Aware Signal Processing • Case study 1: OFDMA transmitter • Case study 2: OFDMA receiver • Case study 3: Dynamic fixed-point format assignment Bruno Bougard et al. Athens, May 2008 27 Motivation: OFDMA Modulation Error requirements vary WiMAX Specification Modulation accuracy can be relaxed for lower order modulation Bruno Bougard et al. Athens, May 2008 28 RCE relaxation can be exploited by a scalable digital OFDMA Modulator • • Original: A large-size (e.g., 1024) IFFT based non-scalable modulator Transformed: An scalable OFDMA modulator with 3 cascaded components Interpolation factor can be used as a knob to adjust the accuracy and computation load to the RCE requirement Bruno Bougard et al. Athens, May 2008 29 Normalized cycle count Computation load scales smoothly with the interpolation factor Interpolation factor Bruno Bougard et al. Athens, May 2008 30 Outline • IMEC SDR Baseband Platform • Wanted: Platform Aware Signal Processing • Case study 1: OFDMA transmitter • Case study 2: OFDMA receiver • Case study 3: adaptive fixed-point refinement Bruno Bougard et al. Athens, May 2008 31 OFDMA mod./demod. requires (I)FFT with Partial input/output The position and number of bins change dynamically Bruno Bougard et al. Athens, May 2008 32 Efficient Partial FFT on ILP Architectures • Exploit the partial input/output to reduce active instructions and memory accesses • 30 years theoretical research on PFFT but few implementations • We propose a generic and efficient scheme for PFFT on ILP architectures – Any pattern of bin-distribution can be implemented Bruno Bougard et al. Athens, May 2008 33 The proposed scheme brings important gains in almost all implementation cost factors and scales smoothly with the number of sub-carriers to be processed Bruno Bougard et al. Athens, May 2008 34 The prize to pay is an higher instruction cache miss rate (acceptable) Bruno Bougard et al. Athens, May 2008 35 Outline • IMEC SDR Baseband Platform • Wanted: Platform Aware Signal Processing • Case study 1: OFDMA transmitter • Case study 2: OFDMA receiver • Case study 3: Dynamic fixed-point format assignment Bruno Bougard et al. Athens, May 2008 36 State-of-the-art • Automatic Floating point to fixed point conversion (>30 years of work) – Commercial products: Catalytic Inc. & Mathworks – Recent academic contributions: • Simulation-based: Seoul National Univ. (‘95) • Analytical methods: Aachen (‘98), Northwest Univ. (‘01) • Hybrid methods: Imperial College (‘03), Berkeley (‘04) and ENSSAT (‘05) • Run-time word-length selection: Receiver VLSI architecture based in a control feedback loop. Hokkaido University (‘06) [Yoshizawa, S. et Al. ISCAS’06] Bruno Bougard et al. Athens, May 2008 37 Modeling of the fixed-point communication system • Performance of the communication system as a function of the receiver SNR – BER = f(SNR) • Fixed-point refined system includes quantization noise – BER = f(SNR, na, nb, …) = f’(SNR) ≈ f(SNR’) • Implementation-scenarios defined and optimized at design time 120 A B C D a + a + b c na c a + nb nc Throughput [Mbps] 100 BPSK 1/2 QPSK 1/2 16QAM 1/2 64QAM 2/3 80 60 40 20 0 0 5 10 15 20 SNR [dB] 25 30 35 Bruno Bougard et al. Athens, May 2008 40 38 Opportunity: application dynamics and tolerance to inaccuracy can be propagated to the implementation • Multiple link parameters trade off noise/interference robustness versus data rate A B C D 100 Throughput [Mbps] SYSTEM LEVEL 120 BPSK 1/2 QPSK 1/2 16QAM 1/2 64QAM 2/3 80 60 40 20 5 10 15 20 SNR [dB] 25 30 35 40 IMPLEMENTATION LEVEL 0 0 • Different system configurations have different requirements in [digital] signal processing accuracy use different implementations noise SNR Analog #bits Digital A TX + FE DSP RX Channel • We adapt the application fixed-point mapping at run-time • By switching between the “mappings”, the average load is reduced Bruno Bougard et al. Athens, May 2008 39 SDR enables more agile signal processing implementations QoS req. Run-time controller Chan Att Monitoring info Adapt Data format Freq Time DSP implementation ENVIRONMENT impl. A current conditions Monitor impl. B impl. N ... BB func. SDR PLATFORM Several sw implementation of the same functionality with different precision/computation load Monotonic relation between precision/load One can switch between sw implementation in a few cycles Program memory Scen. sel SDR Processor Bruno Bougard et al. Athens, May 2008 40 Dynamic fixed-point format assignment increases energy efficiency in situation requiring lower performance Bruno Bougard et al. Athens, May 2008 41 Dynamic fixed-point format assignment increases energy efficiency in situation requiring lower performance Bruno Bougard et al. Athens, May 2008 42 Dynamic fixed-point format assignment increases energy efficiency in situation requiring lower performance Bruno Bougard et al. Athens, May 2008 43 Increase in scalability • Energy efficiency increased at lower rate modes • Average energy consumption is reduced Bruno Bougard et al. Athens, May 2008 44 Conclusions • Energy efficiency of flexible implementation closer to their dedicated hardware counterparts: – Has the potential to continuously best-fit the dynamism. • Does not rely on hypothetical provision in the standards: – Implementation centric – Applicable to any functional-level algorithmic solutions • Wireless systems context today but also other domains tomorrow: – Digital signal processing with an SNR type constraint and which has dynamic data resolution variation – biomedical signal processing, multimedia, etc. Bruno Bougard et al. Athens, May 2008 45