ACS Unit for Viterbi Decoder - University of California, Berkeley

advertisement
UNIVERSITY OF CALIFORNIA
College of Engineering
Department of Electrical Engineering and Computer Sciences
Project Phase I Specification
NTU IC541CA (Fall 2001)
1.
Designing a Viterbi Decoder - Background
The Viterbi algorithm is commonly used in a wide range of communications and data sorage
applications. It is used for decoding convolutional codes, in baseband detection for wireless
systems, and for detection of recorded data in magnetic disk drives. The requirements for the
Viterbi decoder, which is a processor that implements the Viterbi algorithm, depend on the
application in which it is used. This results in a very wide range of required data throughputs and
may impose area or power restrictions. The Viterbi detectors used in cellular telephones have low
data rates (typically less than 1Mb/s) but must have very low energy consumption. On the opposite
end of the scale, very high speed Viterbi detectors are used in magnetic disk drive read channels,
with throughputs over 600Mb/s but power consumption are not as critical. Since both of these are
high volume applications, reduced silicon area can reduce cost significantly.
In this semester’s project we will design a critical part of a Viterbi decoder, under different design
constraints.
1.1.
The Viterbi Algorithm
Any realistic transmission medium will have some distortion, so the job of the receiver is to figure
out what the transmitter actually sent based upon the noisy signal it received. In a cell phone, there
can be interference or low signal strength making it hard to decide what each bit is. In a hard drive,
the disk is spinning so fast that the binary nature of the data tends to get blurry. In these cases,
instead of outputting bits, the receiver outputs ‘soft symbols’. This description will assume that
each soft symbol is a ‘fuzzy’ bit r, which is continuous over the range [0, 1]. An example output
sequence from the receiver might begin like:
time
0 us
10 us
20 us
30 us
40 us
r
0.05
0.45
0.65
0.8
0.4
The Viterbi algorithm takes a sequence of soft symbols and determines the most likely sequence of
real bits. Although your first instinct might be to just round each fuzzy bit to zero or one (a
technique called slicing), this often turns out to be inaccurate in the presence of inter-symbol
interference (ISI). ISI occurs when the value of the previous bits affect the current bit.
A trellis diagram is a time-indexed version of a state machine, and the simplest 2-state trellis is
shown in Figure 1. Each state corresponds to a possible pattern of recently received data bits and
each branch corresponds to a receipt of the next (noisy) input. The goal is to find the path through
the trellis of maximum likelihood, because that path corresponds to the most likely pattern that the
transmitter actually sent.
sm1n-1
bm1
sm1n
bm2
bm3
sm2n-1
bm4
tn-1
sm2n
tn
time
Figure 1: Two state trellis.
There are three basic components of a Viterbi decoder: the branch metric unit, the add-compareselect (ACS) unit, and the trace-back unit. The branch metric unit takes the fuzzy bit and calculates
the cost for each branch of the trellis. A simple branch metric unit might use Euclidean distance
from the received symbol to the equalization target tk (zero or one in our case):
bmk = (r – tk)2, k = {1, 2, 3, 4}
The add-compare-select unit is the heart of the Viterbi algorithm and calculates the state metrics.
These state metrics accumulate the minimum cost of ‘arriving’ in a specific state. The branch
metrics are added to state metrics from the previous time instant and the smaller sum is selected as
the new state metric:
sm1n = min( sm1n-1 + bm1, sm2n-1 + bm3 )
sm2n = min( sm1n-1 + bm2, sm2n-1 + bm4 )
At any point in time, whichever state metric is smaller is the most likely sequence. The trace-back
unit can then output the sequence of branches used to get to that state. In theory, finding the most
likely path would require processing the entire input sequence. In practice, the survivor paths
merge after some number of iterations. The trellis depth at which all the survivor paths merge with
high probability is referred to as the survivor path length.
1.2.
Implementing the ACS Unit
The ACS unit is responsible for implementing the state metric computation. A direct
implementation requires two identical datapaths each with two additions, a comparison, and a
selection (hence the name ACS). The basic datapath is shown in Figure 2. Note that although the
branch metrics are positive numbers with 5 bits each, the and state metrics are required to have 7
bits. After some amount of time, the state metrics will overflow so a trick called modulo
normalization is used. The upshot is that one extra bit is required in the state metric (8 total) and the
MSB of a simple subtraction can be used for the compare operation. Lastly, an array of 2:1
multiplexers is used to pick the minimum.
bm1
5
sm1
1
8
8
8
1
MSB
S
8
nsm
8
0
sm2
8
8
8
5
bm2
Figure 2: Block diagram of ACS unit
2.
Implementation and Constraints
The goal is to design an ACS unit to be used in a Viterbi decoder. The project will be completed in
TWO phases.
PHASE 1: Design conception, schematic capture, simulation
PHASE 2: Layout, simulation with parasitics, comparison to schematic simulation
PHASE 1 GOALS:
The goal of the first phase is to perform the logic optimization, circuit style selection, and first-order
COMBINATIONAL circuit optimization to implement an efficient design. Fine tuning can be
made in the second phase after layout constraints become clearer.
You should select one of the following design CASES:
a) Low data throughput: Design a single ACS such that the average energy is minimized
while still meeting the constraint that the worst-case delay is smaller than 50 ns! No
constraints are put on the area.
b) High data throughput: Maximize the single ACS operating speed. No constraints are put
on area or power.
c) Low area decoder: Minimize the area of a single ACS, while meeting the constraint that
the worst-case delay is smaller than 50 ns! No constraints are put on energy
consumption.
You are free to choose any logic family for the implementation: complementary CMOS, pseudoNMOS, pass-transistor logic, dynamic logic, etc.
2.1. TECHNOLOGY: The design is to be implemented in a 0.25 m CMOS process with 4
metal layers. The SPICE technology is in the g25.mod file available on the web site.
2.2. POWER SUPPLY: You are free to choose any supply voltage and logic swing up to a
maximum of 2.5 V. Make sure that you use the appropriate model when you perform hand
analysis.
2.3. PERFORMANCE METRIC: The propagation delays for static designs is defined as the
time interval between the 50% transition point of the inputs and the 50% point of the worst-case
output signal. Make sure you pick the worst-case condition and state EXPLICITLY in your report
what that condition is. Note that for dynamic designs the propagation delay is defined in this case
as the delays of the evaluate phase ONLY (at least in this phase of the project)!
2.4. AREA: The area is defined as the smallest rectangular box that can be drawn around the
design.
2.5. NAMING CONVENTIONS: You should label the inputs and outputs of the design as it is
shown in Figure 2. The least significant bits of the state metrics should be labeled as sm1[0] and
sm2[0], and the most significant bits should be labeled as sm1[7] and sm2[7]. The least significant
bits of the branch metrics should be labeled as bm1[0] and bm2[0], and the most significant bits
should be labeled as bm1[4] and bm2[4]. The newly computed state metric should be labeled as
nsm[7]-nsm[0] (MSB to LSB).
2.6. REGISTERS: You do not need to design any registers. The data flow from input to output
should be combinational logic.
2.7. CLOCKS: There should be no global clock, since the design is combinational. If you
choose dynamic logic, you are permitted a precharge/evaluate clock, but the result must become
available after ONE evaluate stage (no pipelined logic).
2.8. VOH, VOL, NOISE MARGINS: You are free to choose your logic swing. The noise
margins should be at least 10% of the voltage swing. Test this by computing the VTC between one
of the inputs and the output signals (with the other outputs set to the appropriate values) for a static
design. For a dynamic circuit, apply an input signal with a 10% noise value added to the input and
observe the outputs.
2.9. RISE AND FALL TIMES: All input signals have rise and fall times of 200 ps. The rise
and fall times of the output signals (10% to 90%) should not exceed 1 ns.
2.10. LOAD CAPACITANCE: Each output bit of the priority encoder should have a load
equivalent to 4 unit inverters (Wn = 0.25 m, Wp = 0.50 m).
3.
Simulation
Analyze the circuit for functional correctness using either a switch level simulator or SPICE.
Identify the critical path of your circuit and simulate its delay in SPICE.
4.
Report
The quality of your report is as important as the quality of your design. One must sell the design by
justifying the design decisions and by providing all the vital information. Be sure to emphasize
relevant information by eliminating unnecessary material. Organization, conciseness, and
completeness are of paramount importance. Do not repeat information we already know. Use
the templates provided on the web-page (Word, and PDF formats). Make sure to fill in the coverpage and use the correct units. A report has to be submitted at the end of each phase of the project.
You can e-mail an electronic submission of your report as a Word or PDF file to
msheets@eecs.berkeley.edu. You may also submit your report to the CalView office by regular
mail or FAX.
Your report should discuss your overall design philosophy and the important design decisions made
at the logic and circuit level. Discuss why your approach increases the operating speed or helps to
reduce energy or area, while meeting the performance specifications. Provide your current
estimates of the results and describe how you arrived at them. Include schematics and highlight the
important elements.
Prove that your results are accurate by providing the crucial plots (don’t forget to mention the input
patterns used to obtain those plots). The total report should not contain more than three pages. You
may add additional pages for important plots or figures, but try to keep the report succinct. The
organization of the report should be based on the following outline:
Page 1: Executive summary, overall design decisions, remarks, and motivations
Page 2: Logic and transistor diagrams, annotated with transistor sizes and worst-case
timing path. Plot showing the functional operation of the cell. Comments.
Page 3: Timing and energy simulations – derive the value of the worst-case path and
average energy. For the latter, a set of test patterns will be provided on the web
page.
Lastly, you are required to e-mail the SPICE INPUT DECK used to analyze the energy to
msheets@eecs.berkeley.edu. Remember, a good report is like a good layout: it should perform its
function (convey information) in the smallest possible area with the least delay and energy (required
by the reader) possible.
The quality of the report is an important (major) part of the grade!
The total project grade is divided into the two phases
60% Phase 1
40% Phase 2
For each phase, the grade will be divided similarly to as follows:
30% Results
20% Approach and correctness
40% Report
10% Creativity
Download