IMEC ILP Technology ADRES - DRESC

advertisement
Bridging the Energy Gap in Size, Weight and
Power Constrained Software Defined Radio:
Agile Baseband Processing as a Key Enabler
Bruno Bougard, Min Li, David Novo,
Liesbet Van der Perre and Francky Catthoor
The number of standards to implement in a single
handset increases dramatically
Driving
Mobility
UMTS
3G LTE
Stationary
Walking
GSM
GPRS
HSxPA
EDGE
IEEE
802.16e
IEEE
802.16a,d
WLAN
WLAN
(IEEE
(IEEE 802.11b)
802.11a/g/n)
DECT
BlueTooth
0.1
1
10
100 Mbps
Data rate
Bruno Bougard et al.
Athens, May 2008
2
All cost factors direct towards high-volume
programmable solutions everywhere possible
[source: ICERA]
Bruno Bougard et al.
Athens, May 2008
3
Two barriers remain
LTE?
MIMO
3G+
.11n
3G
.11g
2G
The energy gap
.11b
The exploding complexity
Bruno Bougard et al.
Athens, May 2008
4
Most SWPC SDR research focuses on more energy
efficient processor architectures
•
•
ASICs
Efficiency
VLIW/DSPs
ASIPs
FPGAs
?
VLIW/DSPs
ASIPs
FPGAs
RISCs
•
•
•
•
•
•
•
•
NXP onDSP, EVP
Sandbridge Sandblaster
SB3011
SiliconHive CSP2200
Infineon MUSIC
Icera DXP
Nokia VectorASIP
UMich SODA
ULinkoping/CORESONI
CS BBE2
TUDresden SAMIRA
…
GPPs
Flexibility
Bruno Bougard et al.
Athens, May 2008
5
Radio Baseband Platform Requirements
Low Cost
Long HW lifespan
Short SW deployment time
Scalable HW/SW
Energy aware HW
Energy aware algorithms
Energy aware protocols
Techno-aware power managnt
Energy Aware
Versatile RX digital front end
Versatile TX digital front end
Powerful MAC/RLC/QoS Ctrl
Spectrum Agile
Bruno Bougard et al.
Athens, May 2008
6
Outline
• IMEC SDR Baseband Platform
• Wanted: Platform Aware Signal Processing
• Case study 1: OFDMA transmitter
• Case study 2: OFDMA receiver
• Case study 3: Dynamic fixed-point format
assignment
Bruno Bougard et al.
Athens, May 2008
7
Outline
• IMEC SDR Baseband Platform
• Wanted: Platform Aware Signal Processing
• Case study 1: OFDMA transmitter
• Case study 2: OFDMA receiver
• Case study 3: Dynamic fixed-point format
assignment
Bruno Bougard et al.
Athens, May 2008
8
Diversity/Versatility
Where do you need flexibility?
Where do you need energy efficiency?
Modulation
Demodulation
(Inner Modem)
Synchronization
Forward Error
Correction
(Outer Modem)
FE steering
Signal detection
Duty Cycle
Bruno Bougard et al.
Athens, May 2008
9
Need in flexibility
Where do you need flexibility?
Where do you need energy efficiency?
Modulation
Demodulation
(Inner Modem)
Synchronization
Forward Error
Correction
(Outer Modem)
FE steering
Signal detection
Need in energy efficiency
Bruno Bougard et al.
Athens, May 2008
10
IMEC MIMO-capable SDR baseband platform
802.11n
802.16e
and next gen.
and next gen.
3GPP
LTE
DVB-H/T
• Up to 3 antennas
• Up to 200Mbps
• <500mW
Flexible platform
Bruno Bougard et al.
Athens, May 2008
11
Two Programmable CGA Processor Cores at its heart
2.55 mm
CGA Config mem
2.27 mm
I$
Core logic
(including registerfiles)
•
•
•
•
AHB
L1 scratchpad
CGA Config mem
32KB I$
128KB IMEM
128-entries CMEM
64KB L1 data scratchpad
• TSMC 90G
• Dual VT and substrate biasing for
leakage reduction in sleep mode
• Clock rate 400MHz WCC
• Total Area: 6 sqmm
• Power consumption
• 4x4 64-bit 4-way SIMD CGA
• VLIW and CGA mode of operations
• C-programmable
– Active TC VLIW
75mW
– Active TC CGA
300mW
– Leakage @ T=65C
25mW
• 25 (theoretical) GOPS
• 46MOPS/mW
– Leakage in standby
<10mW
Bruno Bougard et al.
Athens, May 2008
14
200 Mbps+ SDR application driver
• IEEE 802.11 n digital inner
modem receiver
# ant.
mod.
scheme
cod.
rate
SNR [dB]
-3
BER = 10
1
BPSK
1/2
3.0
1
QPSK
1/2
6.5
1
16QAM
1/2
12.5
– Channel bonding 40MHz
1
64QAM
3/4
22.3
– 2 antennas MIMO SDM OFDM
2
BPSK
1/2
5.5
2
QPSK
1/2
11.5
2
16QAM
1/2
18.0
2
64QAM
3/4
34.0
Bruno Bougard et al.
Athens, May 2008
15
Profiling for SDR benchmarks and OFDM full
application prove real time operations @100Mbps
2-antenna SDM-OFDM @100Mbps
Total per symbol processing
Total preample processing
QAM demap
tracking
SDM MMSE (2x)
freq offset comp
freq offset estim.
fft (2x)
xcorr
fshift
acorr
0
2
4
6
8
10
12
14
16
18
execution time @ 400MHz (Us)
Bruno Bougard et al.
Athens, May 2008
17
Great benefit in area but power higher than
dedicated hardware solutions
4
350
3.5
3
2.5
SDR
(IMEC)
400
SDR
(IMEC)
Reconf.
(Intel)
ASIC
(Atheros)
300
ASIC
(source:
Intel)
250
200
150
100
2
50
1.5
VLIW ctrl
0%
0
FU VLIW
4%
1
FU CGA
25%
CGA intercon - mux
- pipeline
38%
0.5
0
802.11n
802.16e
DVB-H
11n&16e
all
VLIW reg
6%
Active Power VLIW: 75mW
Active Power CGA: 300mW
Leakage Power: 25mW
peripherals
1%
I$
1%
CGA reg
2%
CMEM
13%
DMEM
10%
Bruno Bougard et al.
Athens, May 2008
18
The interconnection network dominates the
power consumption in VLIW and CGA modes
VLIW mode
CGA mode
VLIW ctrl
0%
VLIW ctrl
0%
Interconnect + mux
28%
FU VLIW
4%
FU VLIW
22%
FU CGA
25%
CGA intercon - mux
- pipeline
FU CGA
38%
2%
peripherals
2%
VLIW reg
21%
I$
10%
DMEM
13%
CGA reg
2%
CMEM
0%
VLIW reg
6%
peripherals
1%
I$
1%
CGA reg
2%
CMEM
13%
DMEM
10%
Active power: 75mW
Leakage Power: 25mW
Active power: 300mW
Leakage Power: 25mW
Bruno Bougard et al.
Athens, May 2008
19
Outline
• IMEC SDR Baseband Platform
• Wanted: Platform Aware Signal Processing
• Case study 1: OFDMA transmitter
• Case study 2: OFDMA receiver
• Case study 3: Dynamic fixed-point format
assignment
Bruno Bougard et al.
Athens, May 2008
20
Wanted: SDR-Platform Aware Signal Processing
Elephant as Platform
Horse as Platform
Bruno Bougard et al.
Athens, May 2008
21
Dynamic signal processing implementation
3GPP
Channel
response
Cycle Count on
SoA processor
Time
Bruno Bougard et al.
Athens, May 2008
22
Wanted: SDR-Platform Aware Signal Processing
ASIC as platform
• Requires simple control flow
SDR as platform
• Maximum functional reuse is a
must
• Accommodates more complex
control flows
• Accommodates complex and
irregular computation
structures
• Functional reuse not a must
(reuse memory footprint only)
• Minimum data wordwidth
• Accommodates high computation
loads
• Highest energy efficiency
• Aligned data wordwidth
• Limited maximum computation
load
• Lower energy efficiency
• Requires manifest and regular
computation structures
Bruno Bougard et al.
Athens, May 2008
23
Algorithm-Architecture Co-Design
• Make algorithm compatible with architecture/compiler constraints
• Exploit opportunities of programmable architecture
Bruno Bougard et al.
Athens, May 2008
24
Observation
Channel
Channel
• Wireless baseband processing implies high dynamics
• Wireless baseband processing tolerate inaccuracy
• This is already considered at system level (X-layer), but
what about in the signal processing implementation?
Bruno Bougard et al.
Athens, May 2008
25
The opportunity
• Two viewpoints toward complexity
– Computation complexity and memorySDR
complexity
Baseband
High Structure
– Structure complexity (control flow,with
heterogeneity
, etc.)
Complexity
• Wireless system can cope with inaccuracy (“scalable” QoS)
• On SDR
– Computation complexity is much more costly than in ASIC
– Memory complexity is as costly as in ASIC
– Structure complexity is much less costly than in ASIC
• What can we do ?
Baseband ASIC
Increase the structure
complexity of baseband processing
to reduce
with Low Structure
the average
computation
Complexity
and memory complexity
by enabling run-time adaptation of the algorithms implementation
to the dynamics in QoS requirement, environment (and platform)
Bruno Bougard et al.
Athens, May 2008
26
Outline
• IMEC SDR Baseband Platform
• Wanted: Platform Aware Signal Processing
• Case study 1: OFDMA transmitter
• Case study 2: OFDMA receiver
• Case study 3: Dynamic fixed-point format
assignment
Bruno Bougard et al.
Athens, May 2008
27
Motivation: OFDMA Modulation Error requirements vary
WiMAX Specification
Modulation accuracy
can be relaxed for
lower order
modulation
Bruno Bougard et al.
Athens, May 2008
28
RCE relaxation can be exploited by a scalable digital
OFDMA Modulator
•
•
Original: A large-size (e.g., 1024) IFFT based non-scalable
modulator
Transformed: An scalable OFDMA modulator with 3 cascaded
components
Interpolation factor
can be used as a
knob to adjust the
accuracy and
computation load to
the RCE requirement
Bruno Bougard et al.
Athens, May 2008
29
Normalized cycle count
Computation load scales smoothly with the interpolation
factor
Interpolation factor
Bruno Bougard et al.
Athens, May 2008
30
Outline
• IMEC SDR Baseband Platform
• Wanted: Platform Aware Signal Processing
• Case study 1: OFDMA transmitter
• Case study 2: OFDMA receiver
• Case study 3: adaptive fixed-point refinement
Bruno Bougard et al.
Athens, May 2008
31
OFDMA mod./demod. requires (I)FFT with Partial
input/output
The position and number of bins change dynamically
Bruno Bougard et al.
Athens, May 2008
32
Efficient Partial FFT on ILP Architectures
• Exploit the partial input/output to reduce
active instructions and memory accesses
• 30 years theoretical research on PFFT but
few implementations
• We propose a generic and efficient scheme
for PFFT on ILP architectures
– Any pattern of bin-distribution can be implemented
Bruno Bougard et al.
Athens, May 2008
33
The proposed scheme brings important gains in almost
all implementation cost factors and scales smoothly with
the number of sub-carriers to be processed
Bruno Bougard et al.
Athens, May 2008
34
The prize to pay is an higher instruction cache miss rate
(acceptable)
Bruno Bougard et al.
Athens, May 2008
35
Outline
• IMEC SDR Baseband Platform
• Wanted: Platform Aware Signal Processing
• Case study 1: OFDMA transmitter
• Case study 2: OFDMA receiver
• Case study 3: Dynamic fixed-point format
assignment
Bruno Bougard et al.
Athens, May 2008
36
State-of-the-art
•
Automatic Floating point to fixed point conversion (>30 years of work)
– Commercial products: Catalytic Inc. & Mathworks
– Recent academic contributions:
• Simulation-based: Seoul National Univ. (‘95)
• Analytical methods: Aachen (‘98), Northwest Univ. (‘01)
• Hybrid methods: Imperial College (‘03), Berkeley (‘04) and ENSSAT (‘05)
•
Run-time word-length selection: Receiver VLSI architecture based in a
control feedback loop. Hokkaido University (‘06)
[Yoshizawa, S. et Al. ISCAS’06]
Bruno Bougard et al.
Athens, May 2008
37
Modeling of the fixed-point communication system
• Performance of the communication
system as a function of the receiver
SNR
– BER = f(SNR)
• Fixed-point refined system includes
quantization noise
– BER = f(SNR, na, nb, …) = f’(SNR) ≈ f(SNR’)
• Implementation-scenarios defined
and optimized at design time
120
A
B
C
D
a
+
a
+
b
c
na
c
a
+
nb
nc
Throughput [Mbps]
100
BPSK 1/2
QPSK 1/2
16QAM 1/2
64QAM 2/3
80
60
40
20
0
0
5
10
15
20
SNR [dB]
25
30
35
Bruno Bougard et al.
Athens, May 2008
40
38
Opportunity: application dynamics and tolerance to
inaccuracy can be propagated to the implementation
• Multiple link parameters trade off noise/interference robustness
versus data rate
A
B
C
D
100
Throughput [Mbps]
SYSTEM LEVEL
120
BPSK 1/2
QPSK 1/2
16QAM 1/2
64QAM 2/3
80
60
40
20
5
10
15
20
SNR [dB]
25
30
35
40
IMPLEMENTATION LEVEL
0
0
• Different system configurations have different requirements in
[digital] signal processing accuracy  use different implementations
noise
SNR Analog #bits Digital
A
TX
+
FE
DSP
RX
Channel
• We adapt the application
fixed-point mapping at run-time
• By switching between the “mappings”, the average load is reduced
Bruno Bougard et al.
Athens, May 2008
39
SDR enables more agile signal processing
implementations
QoS req.
Run-time
controller
Chan
Att
Monitoring info
Adapt
Data format
Freq
Time
DSP
implementation
ENVIRONMENT
impl.
A
current
conditions
Monitor
impl.
B
impl.
N
...
BB func.
SDR PLATFORM
 Several sw implementation
of the same functionality with
different precision/computation load
 Monotonic relation between
precision/load
 One can switch between sw
implementation in a few cycles
Program memory
Scen. sel
SDR Processor
Bruno Bougard et al.
Athens, May 2008
40
Dynamic fixed-point format assignment increases energy
efficiency in situation requiring lower performance
Bruno Bougard et al.
Athens, May 2008
41
Dynamic fixed-point format assignment increases energy
efficiency in situation requiring lower performance
Bruno Bougard et al.
Athens, May 2008
42
Dynamic fixed-point format assignment increases energy
efficiency in situation requiring lower performance
Bruno Bougard et al.
Athens, May 2008
43
Increase in scalability
• Energy efficiency
increased at lower
rate modes
• Average energy
consumption is
reduced
Bruno Bougard et al.
Athens, May 2008
44
Conclusions
• Energy efficiency of flexible implementation closer to their
dedicated hardware counterparts:
– Has the potential to continuously best-fit the dynamism.
• Does not rely on hypothetical provision in the standards:
– Implementation centric
– Applicable to any functional-level algorithmic solutions
• Wireless systems context today but also other domains
tomorrow:
– Digital signal processing with an SNR type constraint and which has
dynamic data resolution variation
– biomedical signal processing, multimedia, etc.
Bruno Bougard et al.
Athens, May 2008
45
Download