Architectural Exploration:

advertisement
Architectural Exploration:
Area-Performance tradeoff in
802.11a Transmitter
Arvind
Computer Science & Artificial Intelligence Lab
Massachusetts Institute of Technology
March, 2007
http://csg.csail.mit.edu/arvind
802.11a-1
802.11a Transmitter Overview
headers
24
Uncoded
bits
Controller
data
Scrambler
Interleaver
Mapper
Cyclic
Extend
IFFT
IFFT Transforms 64 (frequency domain)
complex numbers into 64 (time domain)
complex numbers
March, 2007
Encoder
Must produce
one OFDM
symbol every
4 μsec
Depending
upon the
transmission
rate,
consumes 1,
2 or 4 tokens
to produce
one OFDM
symbol
One OFDM symbol
(64 Complex Numbers)
http://csg.csail.mit.edu/arvind
accounts for 85% area
802.11a-2
1
Preliminary results
[MEMOCODE 2006] Dave, Gerding, Pellauer, Arvind
Design
Block
Lines of
Code (BSV)
Controller
Scrambler
Conv. Encoder
Interleaver
Mapper
IFFT
Cyc. Extender
49
40
113
76
112
95
23
Relative
Area
0%
0%
0%
1%
11%
85%
3%
Complex arithmetic libraries constitute another 200
lines of code
802.11a-3
http://csg.csail.mit.edu/arvind
March, 2007
Combinational IFFT
in0
in1
…
x16
Bfly4
Bfly4
…
Bfly4
Bfly4
…
Bfly4
in63
out1
Permute
in4
Bfly4
Bfly4
Permute
in3
Bfly4
Bfly4
Permute
in2
out0
out2
out3
out4
…
out63
Reuse the same circuit three times
to reduce area
March, 2007
http://csg.csail.mit.edu/arvind
802.11a-4
2
Design Alternatives
Reuse a block over multiple cycles
f
f
g
f
g
we expect:
Throughput to
Area to
The clock needs to run faster for the
same throughput ⇒ hyper-linear
increase in energy
802.11a-5
http://csg.csail.mit.edu/arvind
March, 2007
Circular pipeline: Reusing the
Pipeline Stage
in0
out0
Bfly4
in2
…
in3
Bfly4
Permute
in1
in4
…
out2
out3
out4
Stage
Counter
in63
March, 2007
out1
…
out63
http://csg.csail.mit.edu/arvind
802.11a-6
3
Superfolded circular pipeline:
Just one Bfly-4 node!
in0
out0
in1
64, 2-way
Muxes
in2
in3
out1
Permute
Bfly4
Stage
0 to 2
in4
in63
Index:
0 to 15
out3
out4
4, 16-way
DeMuxes
4, 16-way
Muxes
…
out2
…
out63
Index == 15?
802.11a-7
http://csg.csail.mit.edu/arvind
March, 2007
Pipelining a block
f1
C
f2
inQ
P
outQ
f1
f2
outQ
f
FP
inQ
March, 2007
Pipeline
f3
inQ
Clock?
Combinational
f3
outQ
Area?
http://csg.csail.mit.edu/arvind
Folded
Pipeline
Throughput?
802.11a-8
4
Synchronous pipeline
f2
f1
f3
x
inQ
sReg1
sReg2
rule sync-pipeline (True);
inQ.deq();
sReg1 <= f1(inQ.first());
sReg2 <= f2(sReg1);
outQ.enq(f3(sReg2));
endrule
This is real IFFT code; just
replace f1, f2 and f3 with stage_f
code
March, 2007
outQ
This rule can fire only if
Atomicity: Either all or
none of the state
elements inQ, outQ,
sReg1 and sReg2 will be
updated
802.11a-9
http://csg.csail.mit.edu/arvind
Stage functions f1, f2 and f3
function f1(x);
return (stage_f(1,x));
endfunction
function f2(x);
return (stage_f(2,x));
endfunction
The stage_f
function
was given
earlier
function f3(x);
return (stage_f(3,x));
endfunction
March, 2007
http://csg.csail.mit.edu/arvind
802.11a-10
5
Problem: What about
pipeline bubbles?
f2
f1
f3
x
inQ
sReg1
sReg2
rule sync-pipeline (True);
inQ.deq();
sReg1 <= f1(inQ.first());
sReg2 <= f2(sReg1);
outQ.enq(f3(sReg2));
endrule
outQ
Red and Green tokens
must move even if there
is nothing in the inQ!
Also if there is no token
in sReg2 then nothing
should be enqueued in
the outQ
Modify the rule to deal with these conditions
March, 2007
Valid bits or
the Maybe type
802.11a-11
http://csg.csail.mit.edu/arvind
The Maybe type data in the
pipeline
typedef union tagged {
void Invalid;
data_T Valid;
} Maybe#(type data_T);
data
valid/invalid
Registers contain Maybe
type values
rule sync-pipeline (True);
if (inQ.notEmpty())
begin sReg1 <= Valid f1(inQ.first()); inq.deq(); end
else sReg1 <= Invalid;
case (sReg1) matches
tagged Valid .sx1: sReg2 <= Valid f2(sx1);
tagged Invalid:
sReg2 <= Invalid;
case (sReg2) matches
tagged Valid .sx2: outQ.enq(f3(sx2));
endrule
March, 2007
http://csg.csail.mit.edu/arvind
802.11a-12
6
Folded pipeline
The same code will work
for superfolded pipelines
by changing n and stage
function f
f
x
inQ
stage
sReg
outQ
rule folded-pipeline (True);
if (stage==0)
begin sxIn= inQ.first(); inQ.deq(); end
else
sxIn= sReg;
notice stage
sxOut = f(stage,sxIn);
is a dynamic
if (stage==n-1) outQ.enq(sxOut); parameter
else sReg <= sxOut;
now!
stage <= (stage==n-1)? 0 : stage+1;
endrule
Need type declarations for sxIn and sxOut
802.11a-13
http://csg.csail.mit.edu/arvind
March, 2007
no
forloop
802.11a Transmitter Synthesis
results (Only the IFFT block is changing)
The same
source
code
IFFT Design
Area
(mm2)
Throughput
Latency
(CLKs/sym)
Min. Freq
Required
Pipelined
5.25
04
1.0 MHz
Combinational
4.91
04
1.0 MHz
Folded
(16 Bfly-4s)
3.97
04
1.0 MHz
Super-Folded
(8 Bfly-4s)
3.69
06
1.5 MHz
SF(4 Bfly-4s)
2.45
12
3.0 MHz
SF(2 Bfly-4s)
1.84
24
6.0 MHz
SF (1 Bfly4)
1.52
48
12 MHZ
All these
designs
were done
in less than
24 hours!
TSMC .18 micron; numbers reported are before place and route.
March, 2007
http://csg.csail.mit.edu/arvind
802.11a-14
7
Why are the areas so similar
Folding should have given a 3x
improvement in IFFT area
BUT a constant twiddle allows lowlevel optimization on a Bfly-4 block
„
a 2.5x area reduction!
802.11a-15
http://csg.csail.mit.edu/arvind
March, 2007
Parameterize the
synchronous pipeline
fn
f1
x
inQ
sReg[1]
n and stage
are static
parameters
sReg[n-1] outQ
Vector#(n, Reg#(t)) sReg <- replicateM(mkReg(Invalid));
rule sync-pipeline (True);
if (inQ.notEmpty())
begin (sReg[1]) <= Valid f(0,inQ.first()); inq.deq(); end
else (sReg[1]) <= Invalid;
for (Integer stage = 1; stage < n-1; stage = stage+1)
case (sReg[n-1]) matches
tagged Valid .sx: outQ.enq(f(n-1,sx)); endcase
endrule
March, 2007
http://csg.csail.mit.edu/arvind
802.11a-16
8
Syntax: Vector of Registers
Register
„
suppose x and y are both of type Reg. Then
x <= y means x._write(y._read())
Vector of (say) Int
„
„
x[i] means sel(x,i)
x[i] = y[j] means x = update(x,i, sel(y,j))
Vector of Registers
„
„
March, 2007
x[i] <= y[j] does not work. The parser thinks it
means
(sel(x,i)._read)._write(sel(y,j)._read),
which will not type check
(x[i]) <= y[j] does work!
http://csg.csail.mit.edu/arvind
802.11a-17
9
Download