Digital Signal Processors for Mobile Phone Terminals Katsuhiko Ueda Matsushita Electric Ind. Co., Ltd. K. K.Ueda, Ueda,'99 '99VLSI VLSICircuits CircuitsShort ShortCourse Course Abstract 1 - A DSP is one of the key components in a digital cellular phone terminal. - This talk will discuss the role of the DSP in the terminal and how to achieve high performance with low power consumption. - Mobile phone system is moving rapidly into mobile multimedia era. A DSP architecture suitable for this new era will be also discussed. K. Ueda, '99 VLSI Circuits Short Course Outline 2 1. The role of DSP in cellular phone terminal. 2. How to achieve high performance with low power consumption. 3. Issues for DSP in next generation mobile phone systems. 4. Mobile multimedia DSP. 5. Summary. K. Ueda, '99 VLSI Circuits Short Course Architecture of Portable Phone Terminal Receiver Synthesizer Transmitter 3 Demodulator Equalizer Channel CODEC Speech CODEC Speaker Microphone Modulator ----- Microcomputer Keypad/Display Role of DSP - Speech CODEC - Channel CODEC - Equalizer - Mod/Demodulator K. Ueda, '99 VLSI Circuits Short Course Relationship between Speech sig. and Tx sig. (PDC) Receiver Demodulator Synthesizer Transmitter 4 Point A Speech CODEC Channel CODEC Speaker Microphone Modulator Point B 1 TDMA Frame (40ms) 8bits (u-Law) Point A 125us (8KHz) -> 2,560bits/40ms -> 64kbps(8bits*8KHz) Speech: 138bits/40ms -> 3.45kbps FEC: 86bits/40ms -> 2.15kbps Total: 5.6kbps Point B 42kbps User 6 User 1 User 2 User 3 User 4 User 5 User 6 User 1 1 slot (~6.7ms, 7kbps) K. Ueda, '99 VLSI Circuits Short Course Speech CODEC Digitized Input Speech Signal Spectrum Analysis Gain Code Book [MOPS] 20 15 5 Synthesis Filter Minimizing the difference between input speech signal and synthesized speech signal Parameter PSI-CELP 13kbps VSELP 10 11.2kbps VSELP 5 RPE-LTP 1 0 10 20 Bit rate [kbps] K. Ueda, '99 VLSI Circuits Short Course Dedicated DSP for Portable Phone BASIC DSP POINTER UNIT ADR CTRL ADR CTRL (ex. modulo, bit reverse) (ex. repeat, loop) POINTER POINTER DATA MEMORY I/O DMA CTRL CONTROL SERI AL MEMORY X INST MEMORY PARALLEL MEMORY Y DECODER n Dedicated DSP for Portable Phone - Increase Performance by adding accelerators DPU MUL - Reduce Power consumption ALU SAT REG (ACC) Pc f*c*v^2 f: frequency, c: capacitance v: voltage K. Ueda, '99 VLSI Circuits Short Course 6 A DSP Architecture for Portable Phone Terminal DATA MEMORY da PROGRAM CONTROL AMA Data ROM Data RAM INST ROM IR DEC PU AMB sp I/O IP STACK Double Access 7 cc DMA CONT SERIAL PARALLEL M BUS 16 A BUS 16 B BUS 16 Special memory scheme to EXT CLK realize double speed MAC PLL CLK GEN BSFT DATA REGS RB-MAC ALU SAT ACS DPU ACC MAC DSP-CORE Viterbi accelerator Dedicated MAC unit Double speed MAC scheme Redundant binary number system K. Ueda, '99 VLSI Circuits Short Course Double Speed MAC Scheme POINTER X (PX) 1 MACHINE CYCLE EVEN SIDE 1 cycle POINTER Y (PY) MEMORY X (MX) 16-bit 16-bit MEMORY Y (MY) 16-bit 16-bit ODD SIDE EVEN SIDE ODD SIDE TEMP REG TEMP REG 16-bit 16-bit 16-bit Output of MX D(2x) D(2x+2) D(2x+3) D(2x+1) D(2x+3) D(2x+5) D(2x) D(2x+2) D(2x+3) D(2x+1) D(2x+3) D(2x+5) TEMP REG A-BUS B-BUS 16-bit 8 A-BUS D(2x) D(2x+1) D(2x+2) D(2x+3) D(2x+4) D(2x+5) B-BUS D(2y) D(2y+1) D(2y+2) D(2y+3) D(2y+4) D(2y+5) D(2x) * D(2y) D(2x+1) D(2x+2) D(2x+3) D(2x+4) D(2x+5) * * * * * D(2y+1) D(2y+2) D(2y+3) D(2y+4) D(2y+5) MULTIPLIER 1/2 MACHINE CYCLE 32-bit PIPELINE REG ADDER 1/2 MACHINE CYCLE MULTIPLIER 40-bit ACC MAC UNIT BARREL SHIFTER 0.5 cycle K. Ueda, '99 VLSI Circuits Short Course Accelerator for Viterbi Decoding 9 PM0(t-1) BMa(t) PM1(t-1) BMb(t) PM0(t-1) PM0(t) BMa(t) Add BMb(t) ALU Lower 8-bits Upper 8-bits Compare COMPARATOR REG PM1(t-1) SHIFT REG PM0(t) = min[(PM0(t-1)+BMa(t)), (PM1(t-1)+BMb(t))] Two Adds, one Compare and one Select -> ACS operation Select - Normal operation: The ALU is used as a 16-bit processing unit. - ACS operation: The ALU is used as two 8-bit adders. K. Ueda, '99 VLSI Circuits Short Course Effect of Accelerators 10 Comparison of the number of clock cycles needed to realize an 11.2kbps VSELP CODEC. [%] 100 Total: - 33.1% 80 60 40 - 9.0% - 4.7% Misc - 8% Block Floating Error Correction - 11.4% MAC 20 ALU 0 DSP w/o MAC & Viterbi Accelerators DSP w/ MAC & Viterbi Accelerators K. Ueda, '99 VLSI Circuits Short Course Dual MAC Scheme 11 FIR Filtering: tow outputs in parallel with delay register y(0)=c(0)x(0)+c(1)x(-1)+c(2)x(-2)+c(3)x(-3)+ y(1)=c(0)x(1)+c(1)x(0)+c(2)x(-1)+c(3)x(-2)+ y(2)=c(0)x(2)+c(1)x(1)+c(2)x(0)+c(3)x(-1)+ y(3)=c(0)x(3)+c(1)x(2)+c(2)x(1)+c(3)x(0)+ c(i) x(n-i+1) REG x(n-i) Single MAC Dual MAC Dual MAC with REG # of MAC operations N N N # of Memory reads 2N 2N N c(i) MAC1 MAC0 Acc1 Acc0 Low power consumption K. Ueda, '99 VLSI Circuits Short Course MAC Unit using Redundant Binary Number 12 A-BUS 16 b B-BUS 16 b RBMU MUL 0.5 cycle 24 b BW-MAC 24 b P-Reg 1 ACC 0.5 cycle RBA 1 RBA 2 RBA 3 RB-MAC RBAU P-Reg 2 0 20 40 60 80 100 [%] 40 b RB->B CNV 0.5 cycle Preg2 RTBC 40 b ACC RBMU : Redundant Binary Multiply Unit RBAU : Redundant Binary Accumulation Unit RTBC : Redundant Binary Digit to Binary Digit Conversion Unit Mmux1 Conv FA Tree2(BW) Preg1 RBA Tree2(RB) FA Tree1(BW) RBA Tree1(RB) Partial Product Gen. Encoder Power Consumption Ratio normalized to a BW-MAC K. Ueda, '99 VLSI Circuits Short Course A System LSI realizing Base Band Processing & Control Receiver Synthesizer Transmitter 13 Demodulator Digital Signal Processor Modulator Misc TDMA Controller Audio I/F Speaker Microphone VCO Microcomputer Keypad/Display Features of the LSI Process Tech. 0.35 um CMOS # of Transistors 2.5 Million Die Size 9.26x10.0mm Package 11x11mm CSP 16b MCU MN102L(3MIPS) 16b DSP MN1930(40MIPS) Integrated IP Demodulator, TDMA Controller, VCO, etc. K. Ueda, '99 VLSI Circuits Short Course Goal of the next generation Mobile Phone System 14 Hello High Speed Wireless Network Next Generation System (W-CDMA) Video Phone 8 kbps ~ 2 Mbps System Requirements Current System (PDC) High Bit Rate Data Transfer -> MORE cycles for error correction -> MORE data input/output to/from DSP Video CODEC Capability Of course, LOW POWER 5.6 / 11.2 kbps K. Ueda, '99 VLSI Circuits Short Course Increasing Capability of Access Systems 15 CDMA(Code Division Multiple Access) f1,c1 c2 B c1 A Time A f1,c2 B Freq Increase - Channel Capacity - Data Speed TDMA Digital System ex. IS-95, W-CDMA (Time Division Multiple Access) f1,s1 B A BA A f1,s2 -> System Complexity Time s s s s s 32 (DSP Performance) 1 3 s f1 f2 f3 f4 Freq 2 B Digital System ex. PDC, GSM, IS-54, PHS 1 FDMA (Frequency Division Multiple Access) f1 f2 Time A B f1 f2 f3 f4 Freq A Analog System B K. Ueda, '99 VLSI Circuits Short Course Next Generation Mobile Phone Terminal and Issues to LSIs Low-power & High speed correlator 16 High speed Viterbi/ Turbo decoder Low-power & High speed A/D Data Receiver DUP A/D De-spread Rake Base band Signal Processing unit RF unit Channel CODEC Speech/ Transmitter D/A Video Spread CODEC Control unit High speed & Low-power LSI Voice Video Low-power Video/ Audio CODEC LSI DSP K. Ueda, '99 VLSI Circuits Short Course DSP Architecture for the Next Generation System 17 CORE PU Instruction Memory Adrs Data Memory regs DPP AMB - Trace back - Modulo - Bit reverse Data M BUS PCU IOU 16 Wide bandwidth to/from DSP DPU AB ICU - RF - ADC,DAC - Spreading/ Despreading AMA CKU CKU: DPP: DPU: ICU: IOU: PCU: PU: Clock control Unit Direct Parallel Port Data Processing Unit Interrupt Control Unit data I/O Unit Program flow Control Unit Pointer Unit High Performance Processing Unit for Viterbi Decoding K. Ueda, '99 VLSI Circuits Short Course Dual ACS Operation T-1 2 path metrics are updated in 1 cycle T PM0 18 PM0' PM1 [MIPS] Data Memory PM1' PM1 20 register PM0 40 60 80 Data Rate 32 {PM1,PM0} 8 Kbps 32 {BM1,BM0} (VOICE) Conventional This scheme CMPR 32 Kbps COMP AU1 {BM1+PM0} {BM1+PM0} {BM0+PM1} <> {BM0+PM1} ALU AU0 (DATA) {BM0+PM0} {BM0+PM0} <> {BM1+PM1} {BM1+PM1} 64 Kbps ASR1 (DATA) ASR2 to Data Memory to Data Memory K. Ueda, '99 VLSI Circuits Short Course DSP for Wireless Video Phone 19 SDRAM DMA Controller Video Out Double Buffer DA Video I/F Video In Dedicated Engine DSP Core Double Buffer Shared Memory DSP Core Dedicated Engine Host I/F CPU Shared Register AD Local Memory Local Memory Sub-Processor Main-Processor Features Clock frequency Technology Number of devices Die size Supply voltage Performance 67.5MHz(14.8nsec) 0.25um-CMOS(4-Metal) 7,670KTr. 9.41 x 9.22(=86.76)mm^2 1.8V(Internal), 3.3V(IO) 4GOPS 15frames/sec(CIF CODEC) K. Ueda, '99 VLSI Circuits Short Course Summary 20 1. DSP is one of key components in portable phone terminal and high performance with low power consumption is essential factor. 2. In the mobile multimedia era, new DSPs with higher performance and increased functionality will be necessary. 3. DSP for portable phone must keep on achieving MORE MIPS and LESS power consumption. K. Ueda, '99 VLSI Circuits Short Course [References] [1] K. Ueda, T. Sugimura, et. al., "A 16-bit Digital Signal Processor with Specially Arranged MultiplyAccumulator for Low Power Consumption," IEICE Transaction, Vol. E78-C, No.12, pp.1709-1716, 1995. [2] K. Honma and O. Kato, "Trends of research and development in Europe and America," Journal of The IEICE, Vol.78, no.2, pp.173-178, 1995. [3] H. Kabuo, M. Okamoto, et. al., "An 80 MOPS-Peak High-Speed and Low-Power-Consumption 16-bit Digital Signal Processor," IEEE JSSC, Vol. 31, No. 4, pp.494-503, 1996. [4] A. P. Chandrakasan, S. Sheng, et. al., "Low-power CMOS digital design," IEEE JSSC, Vol. 27, No. 4, pp.473-484, 1992. [5] I. Verbauwhede and M. Touriguian, "Low Power DSP Engine for Wireless Communications," Journal of VLSI Signal Processing 18, pp.177-186, 1998. [6] N. Nakajima, H. Shibata, et al., "Baseband System LSI for Cellular Mobile Telephone," Matsushita Technical Journal, pp.46-52, 1999. [7] M. Okamoto K. Stone, et. al., "A High Performance DSP Architecture for Next Generation Mobile Phone Systems," IEEE DSP Workshop,1998. [8] T. Ishikawa, H. Suzuki, et al., "W-CDMA hardware-related issues," IEEE ICCT,1998. [9] S. Kurohmaru, M. Matsuo, et. al., "A MPEG4 Programmable Codec DSP with an Embedded Pre/Postprocessing Engine," IEEE CICC,1999. K. Ueda, '99 VLSI Circuits Short Course Mobile Phone for Internet & Data Communication Ex. i Mode system provided by NTT DoCoMo Applications - E-mail - Web Browsing - Banking - Locating combining car navigation system etc. Panasonic P502i