UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences Project Phase I Specification NTU IC541CA (Fall 2001) 1. Designing a Viterbi Decoder - Background The Viterbi algorithm is commonly used in a wide range of communications and data sorage applications. It is used for decoding convolutional codes, in baseband detection for wireless systems, and for detection of recorded data in magnetic disk drives. The requirements for the Viterbi decoder, which is a processor that implements the Viterbi algorithm, depend on the application in which it is used. This results in a very wide range of required data throughputs and may impose area or power restrictions. The Viterbi detectors used in cellular telephones have low data rates (typically less than 1Mb/s) but must have very low energy consumption. On the opposite end of the scale, very high speed Viterbi detectors are used in magnetic disk drive read channels, with throughputs over 600Mb/s but power consumption are not as critical. Since both of these are high volume applications, reduced silicon area can reduce cost significantly. In this semester’s project we will design a critical part of a Viterbi decoder, under different design constraints. 1.1. The Viterbi Algorithm Any realistic transmission medium will have some distortion, so the job of the receiver is to figure out what the transmitter actually sent based upon the noisy signal it received. In a cell phone, there can be interference or low signal strength making it hard to decide what each bit is. In a hard drive, the disk is spinning so fast that the binary nature of the data tends to get blurry. In these cases, instead of outputting bits, the receiver outputs ‘soft symbols’. This description will assume that each soft symbol is a ‘fuzzy’ bit r, which is continuous over the range [0, 1]. An example output sequence from the receiver might begin like: time 0 us 10 us 20 us 30 us 40 us r 0.05 0.45 0.65 0.8 0.4 The Viterbi algorithm takes a sequence of soft symbols and determines the most likely sequence of real bits. Although your first instinct might be to just round each fuzzy bit to zero or one (a technique called slicing), this often turns out to be inaccurate in the presence of inter-symbol interference (ISI). ISI occurs when the value of the previous bits affect the current bit. A trellis diagram is a time-indexed version of a state machine, and the simplest 2-state trellis is shown in Figure 1. Each state corresponds to a possible pattern of recently received data bits and each branch corresponds to a receipt of the next (noisy) input. The goal is to find the path through the trellis of maximum likelihood, because that path corresponds to the most likely pattern that the transmitter actually sent. sm1n-1 bm1 sm1n bm2 bm3 sm2n-1 bm4 tn-1 sm2n tn time Figure 1: Two state trellis. There are three basic components of a Viterbi decoder: the branch metric unit, the add-compareselect (ACS) unit, and the trace-back unit. The branch metric unit takes the fuzzy bit and calculates the cost for each branch of the trellis. A simple branch metric unit might use Euclidean distance from the received symbol to the equalization target tk (zero or one in our case): bmk = (r – tk)2, k = {1, 2, 3, 4} The add-compare-select unit is the heart of the Viterbi algorithm and calculates the state metrics. These state metrics accumulate the minimum cost of ‘arriving’ in a specific state. The branch metrics are added to state metrics from the previous time instant and the smaller sum is selected as the new state metric: sm1n = min( sm1n-1 + bm1, sm2n-1 + bm3 ) sm2n = min( sm1n-1 + bm2, sm2n-1 + bm4 ) At any point in time, whichever state metric is smaller is the most likely sequence. The trace-back unit can then output the sequence of branches used to get to that state. In theory, finding the most likely path would require processing the entire input sequence. In practice, the survivor paths merge after some number of iterations. The trellis depth at which all the survivor paths merge with high probability is referred to as the survivor path length. 1.2. Implementing the ACS Unit The ACS unit is responsible for implementing the state metric computation. A direct implementation requires two identical datapaths each with two additions, a comparison, and a selection (hence the name ACS). The basic datapath is shown in Figure 2. Note that although the branch metrics are positive numbers with 5 bits each, the and state metrics are required to have 7 bits. After some amount of time, the state metrics will overflow so a trick called modulo normalization is used. The upshot is that one extra bit is required in the state metric (8 total) and the MSB of a simple subtraction can be used for the compare operation. Lastly, an array of 2:1 multiplexers is used to pick the minimum. bm1 5 sm1 1 8 8 8 1 MSB S 8 nsm 8 0 sm2 8 8 8 5 bm2 Figure 2: Block diagram of ACS unit 2. Implementation and Constraints The goal is to design an ACS unit to be used in a Viterbi decoder. The project will be completed in TWO phases. PHASE 1: Design conception, schematic capture, simulation PHASE 2: Layout, simulation with parasitics, comparison to schematic simulation PHASE 1 GOALS: The goal of the first phase is to perform the logic optimization, circuit style selection, and first-order COMBINATIONAL circuit optimization to implement an efficient design. Fine tuning can be made in the second phase after layout constraints become clearer. You should select one of the following design CASES: a) Low data throughput: Design a single ACS such that the average energy is minimized while still meeting the constraint that the worst-case delay is smaller than 50 ns! No constraints are put on the area. b) High data throughput: Maximize the single ACS operating speed. No constraints are put on area or power. c) Low area decoder: Minimize the area of a single ACS, while meeting the constraint that the worst-case delay is smaller than 50 ns! No constraints are put on energy consumption. You are free to choose any logic family for the implementation: complementary CMOS, pseudoNMOS, pass-transistor logic, dynamic logic, etc. 2.1. TECHNOLOGY: The design is to be implemented in a 0.25 m CMOS process with 4 metal layers. The SPICE technology is in the g25.mod file available on the web site. 2.2. POWER SUPPLY: You are free to choose any supply voltage and logic swing up to a maximum of 2.5 V. Make sure that you use the appropriate model when you perform hand analysis. 2.3. PERFORMANCE METRIC: The propagation delays for static designs is defined as the time interval between the 50% transition point of the inputs and the 50% point of the worst-case output signal. Make sure you pick the worst-case condition and state EXPLICITLY in your report what that condition is. Note that for dynamic designs the propagation delay is defined in this case as the delays of the evaluate phase ONLY (at least in this phase of the project)! 2.4. AREA: The area is defined as the smallest rectangular box that can be drawn around the design. 2.5. NAMING CONVENTIONS: You should label the inputs and outputs of the design as it is shown in Figure 2. The least significant bits of the state metrics should be labeled as sm1[0] and sm2[0], and the most significant bits should be labeled as sm1[7] and sm2[7]. The least significant bits of the branch metrics should be labeled as bm1[0] and bm2[0], and the most significant bits should be labeled as bm1[4] and bm2[4]. The newly computed state metric should be labeled as nsm[7]-nsm[0] (MSB to LSB). 2.6. REGISTERS: You do not need to design any registers. The data flow from input to output should be combinational logic. 2.7. CLOCKS: There should be no global clock, since the design is combinational. If you choose dynamic logic, you are permitted a precharge/evaluate clock, but the result must become available after ONE evaluate stage (no pipelined logic). 2.8. VOH, VOL, NOISE MARGINS: You are free to choose your logic swing. The noise margins should be at least 10% of the voltage swing. Test this by computing the VTC between one of the inputs and the output signals (with the other outputs set to the appropriate values) for a static design. For a dynamic circuit, apply an input signal with a 10% noise value added to the input and observe the outputs. 2.9. RISE AND FALL TIMES: All input signals have rise and fall times of 200 ps. The rise and fall times of the output signals (10% to 90%) should not exceed 1 ns. 2.10. LOAD CAPACITANCE: Each output bit of the priority encoder should have a load equivalent to 4 unit inverters (Wn = 0.25 m, Wp = 0.50 m). 3. Simulation Analyze the circuit for functional correctness using either a switch level simulator or SPICE. Identify the critical path of your circuit and simulate its delay in SPICE. 4. Report The quality of your report is as important as the quality of your design. One must sell the design by justifying the design decisions and by providing all the vital information. Be sure to emphasize relevant information by eliminating unnecessary material. Organization, conciseness, and completeness are of paramount importance. Do not repeat information we already know. Use the templates provided on the web-page (Word, and PDF formats). Make sure to fill in the coverpage and use the correct units. A report has to be submitted at the end of each phase of the project. You can e-mail an electronic submission of your report as a Word or PDF file to msheets@eecs.berkeley.edu. You may also submit your report to the CalView office by regular mail or FAX. Your report should discuss your overall design philosophy and the important design decisions made at the logic and circuit level. Discuss why your approach increases the operating speed or helps to reduce energy or area, while meeting the performance specifications. Provide your current estimates of the results and describe how you arrived at them. Include schematics and highlight the important elements. Prove that your results are accurate by providing the crucial plots (don’t forget to mention the input patterns used to obtain those plots). The total report should not contain more than three pages. You may add additional pages for important plots or figures, but try to keep the report succinct. The organization of the report should be based on the following outline: Page 1: Executive summary, overall design decisions, remarks, and motivations Page 2: Logic and transistor diagrams, annotated with transistor sizes and worst-case timing path. Plot showing the functional operation of the cell. Comments. Page 3: Timing and energy simulations – derive the value of the worst-case path and average energy. For the latter, a set of test patterns will be provided on the web page. Lastly, you are required to e-mail the SPICE INPUT DECK used to analyze the energy to msheets@eecs.berkeley.edu. Remember, a good report is like a good layout: it should perform its function (convey information) in the smallest possible area with the least delay and energy (required by the reader) possible. The quality of the report is an important (major) part of the grade! The total project grade is divided into the two phases 60% Phase 1 40% Phase 2 For each phase, the grade will be divided similarly to as follows: 30% Results 20% Approach and correctness 40% Report 10% Creativity