Module 5: Programmable Components in SoC I 이 찬 호 (숭실대학교, 정보통신전자공학부) SoC Architecture 5. Programmable processor components in SoC I 목차 1. Introduction 1. 2. 3. 4. 2. RISC machine About the ARM architecture Architecture versions Performance comparison Processor architecture 1. 2. 3. 4. 5. Processor modes Registers Instruction format About Thumb instructions Memory model Copyrightⓒ2003 2 SoC Architecture 5. Programmable processor components in SoC I 목차 3. Organization 1. 3-stage pipeline organization 2. 5-stage pipeline organization 3. Multiplier 4. Processor cores 1. 2. 3. 4. 5. 6. 7. 8. Architecture evolutions ARM7 Thumb family StrongARM ARM9 family ARM9E family ARM10 family ARM11 family X-Scale Copyrightⓒ2003 3 SoC Architecture 5. Programmable processor components in SoC I 목차 5. ARM development environment 1. 2. 3. 4. Real-time debug and trace On-chip debug technology ARM development environment RealView development tools 6. IP solutions 1. AMBA 2. PrimeCell peripherals 7. ARM Applications 1. 2. 3. 4. Network microcontroller The Psion Series 5MX GSM system OneC VWS22100 GSM chip Copyrightⓒ2003 4 SoC Architecture 5. Programmable processor components in SoC I 목차 1. Introduction 1. 2. 3. 4. 2. 3. 4. 5. 6. 7. RISC machine About the ARM architecture Architecture versions Performance comparison Processor architecture Organization Processor cores ARM development environment IP solutions ARM applications Copyrightⓒ2003 5 SoC Architecture 5. Programmable processor components in SoC I 1. Introduction 1.1 RISC machine RISC architecture [1] Fixed instruction size (e.g., 32bit) Load-store architecture 1/3 Operands must be located in registers The operation result is put into register Large register file Simple addressing modes RISC organization Hard-wired instruction decoding logic Pipelined execution Single-cycle execution Copyrightⓒ2003 6 SoC Architecture 5. Programmable processor components in SoC I 1.1 RISC machine Advantage Simple hardware Small die size Low power consumption Simple decoding Higher performance 2/3 Easy to implement an effective pipelined structure Disadvantage Poor code density RISC has a fixed size of instruction format Small number of instructions Copyrightⓒ2003 7 SoC Architecture 5. Programmable processor components in SoC I 1.1 RISC machine 3/3 Summary of 80386 and MIPS R2000 architectures [17] MIPS R2000 Intel 80386 Date announced 1986 1985 Instruction size (bits) 32 Variable Address space (size, model) 32 bits, flat 32 bits, segmented with paging support Data alignment Aligned No Data addressing modes 2 11 Protection Page Segmented Scheme Integer registers (number, model, size) 31 GPR*32 bits 8 GPR*32 bits, 6 segment registers*16 bits, 2 other * 16 bits Separate floating-point registers 16*32 or 16*64 bits 8*80 bits Floating-point format IEEE 754 single, double IEEE 754 single, double, extended Copyrightⓒ2003 8 SoC Architecture 5. Programmable processor components in SoC I 1.2 About the ARM architecture The ARM architecture [2] RISC + additional features Occupies almost 75% of 32bit embedded RISC microprocessor market Additional features of ARM Auto-increment/decrement addressing modes Single data-processing instruction can perform both ALU and shifter operations Load/Store multiple instruction Conditional execution Copyrightⓒ2003 9 SoC Architecture 5. Programmable processor components in SoC I 1.3 Architecture versions [3] Copyrightⓒ2003 1/3 10 SoC Architecture 5. Programmable processor components in SoC I 1.3 Architecture versions v4 The oldest version of the architecture supported today 32bit address space T variant: 16 bit Thumb instruction set M variant: long multiply(64bit result) v5 2/3 Improvement of ARM/Thumb inter-working CLZ instruction E variant: Enhanced DSP instruction set J variant: acceleration of Java byte-code execution v6 Improvement of the memory system Support of Single Instruction Multiple Data (SIMD) Copyrightⓒ2003 11 SoC Architecture 5. Programmable processor components in SoC I 1.3 Architecture versions 3/3 Architecture variants T: 16-bit Thumb instruction D: On-chip Debug support M: Hardware long Multiplier I: Embedded ICE E: DSP extension S: Synthesizable core J: Jazelle Java accelerator ~20: with cache and MMU ~40: with cache, protection unit rather than MMU ~22: smaller cache than ~20 Copyrightⓒ2003 12 SoC Architecture 5. Programmable processor components in SoC I 1.4 Performance comparison Copyrightⓒ2003 13 SoC Architecture 5. Programmable processor components in SoC I 목차 1. 2. Introduction Processor architecture 1. 2. 3. 4. 5. 3. 4. 5. 6. 7. Processor modes Registers Instruction format About Thumb instructions Memory model Organization Processor cores ARM development environment IP solutions ARM applications Copyrightⓒ2003 14 SoC Architecture 5. Programmable processor components in SoC I 2. Processor Architecture [2] 2.1 Processor modes Mode reg. CPSR[4:0] User usr 10000 Normal program execution mode with restricted system resources FIQ fiq 10001 Processing fast interrupts IRQ irq 10010 Processing general-purpose interrupts Supervisor svc 10011 Processing software interrupts Abort abt 10111 Processing memory faults Undefined und 11011 Handling undefined instruction traps System sys =usr 11111 Copyrightⓒ2003 Use Running privileged OS tasks (ARM architecture v4 and above) 15 SoC Architecture 2.2 Registers Copyrightⓒ2003 5. Programmable processor components in SoC I 1/3 16 SoC Architecture 5. Programmable processor components in SoC I 2/3 Visible registers 31 general-purpose registers, 6 program status registers At any time, 16 general-purpose registers and one or two status registers are visible according to processor mode General-purpose registers (GPR) Unbanked registers, R0-R7, R15 The same physical registers in all processor modes Banked registers, R8-R14 The physical register referred to by each of them depends on the current processor mode Special function of R13-15 Stack pointer (R13) Link register (R14): save the return address Program counter (R15): point to address of instruction to be fetched Copyrightⓒ2003 17 SoC Architecture 5. Programmable processor components in SoC I 3/3 Program status registers (PSR) CPSR (Current PSR) SPSR (Saved PSR) Each exception mode has a SPSR To preserve the value of the CPSR when the exception occurs Copyrightⓒ2003 18 SoC Architecture 5. Programmable processor components in SoC I 2.3 Instruction format ADDEQS Rd, Rn, Rm, LSL #2 32 28 27 26 25 24 cond 00 # 21 20 19 opcode S 16 15 Rn 12 11 Rd 7 6 5 4 3 #shift Sh 0 0 Rm Condition evaluation load enable Rd flags • • • • Register Bank Update flags if(S==1) Rn Rm Shifter ALU Copyrightⓒ2003 3 address format Conditional execution Specification of flag-update a shifted operand #shift, sh opcode 19 SoC Architecture 5. Programmable processor components in SoC I 2.4 About Thumb instructions Thumb instruction set Re-encoded subset of the most commonly used ARM instruction set 16 bit format: to allow better code density 32-bit performance at 8/16-bit system cost At least, few 32bit ARM codes are needed Exception → the processor switch to ARM state: PSR-manipulating instructions can be called only in ARM state Thumb state 1/3 T in CPSR == 1 Thumb entry By executing BX instruction Copyrightⓒ2003 20 SoC Architecture 5. Programmable processor components in SoC I 2.4 About Thumb instructions Registers Visible GPR Lo registers(r0-r7) Special purpose registers Some thumb IR access Program counter(r15) Link register(r14) Stack pointer(r13) Restricted register access Copyrightⓒ2003 2/3 A few instructions allow the ‘High’ registers(r8~r15) to be specified 21 SoC Architecture 5. Programmable processor components in SoC I 2.4 About Thumb instructions Thumb-ARM similarities Load-store architecture Support 8bit byte, 16bit half-word, 32bit word 3/3 Half-words are aligned on 2byte boundary Words are aligned on 4byte boundary A 32bit unsegmented memory Thumb-ARM differences All Thumb instructions except branch are executed unconditionally 2-address format Lesser addressing modes than ARM Copyrightⓒ2003 22 SoC Architecture 5. Programmable processor components in SoC I 2.4.1 Thumb implementation [1] 1/2 Implementation into a 3-stage pipeline B operand bus data in immediate ¼elds ARM instruction decoder select ARM or Thumb s tream mux Thumb decompressor select high or low half-w ord mux instruction pipeline data in from memory Copyrightⓒ2003 23 SoC Architecture 5. Programmable processor components in SoC I 2.4.1 Thumb implementation 2/2 Instruction mapping Thumb code: ADD|SUB Rd, #<imm8> 15 13 12 11 10 001 10 8 7 0 Rd #imm8 ‘always’ co ndition major opcode, form at 3 : MOV/ CMP/ADD/SUB with imme diate 31 28 27 26 25 24 minor opcode den otin g ADD & se t CC 21 20 19 111 0 00 1 010 0 1 desti nati on and sou rce regi ster 16 15 0 Rd zero shi ft immed iate value 12 11 0 Rd 0 000 0 #imm8 Equivalent ARM code: ADDS Rd, Rd, #<imm8> Copyrightⓒ2003 24 SoC Architecture 5. Programmable processor components in SoC I 목차 1. 2. 3. Introduction Processor architecture Organization 1. 2. 3. 4. 5. 6. 7. 3-stage pipeline organization 5-stage pipeline organization Multiplier Processor cores ARM development environment IP solutions ARM applications Copyrightⓒ2003 25 SoC Architecture 5. Programmable processor components in SoC I A[31:0] 3. Organization 3.1 3-stage pipeline address regis ter 1/3 P C Organization Address generating block Barrel shifter ALU IO registers 31-GPRs, 6-PSRs 2 read, 1 write ports Additional 1 read, 1 write port for PC Instruction pipeline Read data register Byte replicator incrementer PC Address register Incrementer Address selector Register bank control register bank instruction decode A L U b u s multiply register & A B b u s b u s barrel shifter control ALU Control logic External interface Instruction decoder Datapath control Copyrightⓒ2003 data out register data in register D[31:0] 26 SoC Architecture 5. Programmable processor components in SoC I 2/3 Pipeline stages Fetch DP F D E F D E F D Decode Instruction fetch from memory Instruction decoding Datapath control signals for the next cycle Execute PC+i PC+2i E Reading registers Shift and ALU operations Writing back to the register bank Copyrightⓒ2003 27 SoC Architecture 5. Programmable processor components in SoC I 3/3 B PC+i Branch F D E1 E2 E3 LDR F LDR F D E1 E2 E3 Calc xfer move F D E discarded PC+2 i F F D E F D discarded T T+i Copyrightⓒ2003 F D E F D E E 28 SoC Architecture 5. Programmable processor components in SoC I 3.1.1 Multiple load/store instruction LDM LDM F D A1 F A3 … An L1 L2 … Ln-1 Ln M1 … Mn-2 Mn-1 Mn D F Copyrightⓒ2003 A2 E D E F D E 29 SoC Architecture 5. Programmable processor components in SoC I 3.2 5-stage pipeline organization 1/4 To increase performance [1] Increase of the clock rate Simplifying each pipeline stage Increasing the number of pipeline stages Reduction of the average number of clock cycles per instruction (CPI) To prevent von Neumann’s bottleneck Exploiting Harvard architecture Copyrightⓒ2003 30 SoC Architecture 5. Programmable processor components in SoC I next pc 2/4 +4 fetch I-cache pc + 4 Organization r15 immediate fields mul LDM/ STM 3 read, 2 write ports Additional address incrementer for multiple load/store Forwarding paths to resolve data dependencies instruction decode register read Separated cache Register bank I decode Harvard architecture pc + 8 +4 postindex reg shift shift pre-index execute ALU forwarding paths mux B, BL MOV pc SUBS pc byte repl. load/store address D-cache buffer/ data rot/sgn ex LDR pc register write Copyrightⓒ2003 write-back 31 SoC Architecture 5. Programmable processor components in SoC I Pipeline comparison A RM7TDMI: Fetch Decode instruction fetch Thumb decompress 3/4 Execute ARM decode reg read shift/ALU reg write shift/ALU data memor y access reg write Execute Memory ARM9TD MI : Interlock decode Fetch Decode r. read Write [6] LDR rN, [..] ADD r2, r1, rN instr uction fetch ; load rN from somewhere ; and use it immediately The ADD instruction cannot start until the data is returned from the load The ADD instruction has to delay entering the execute stage of the pipeline by one cycle PC behavior [1] The 5-stage pipeline emulate the behavior of the 3-stage designs Copyrightⓒ2003 32 SoC Architecture 5. Programmable processor components in SoC I 4/4 LDR ADD F D E M W LDR F D E M Branch F B W D E M W F D E1 E2 E3 M W F D E M F F D E M W F F D E M W Separated cache Instruction and data cache are accessible at the same time Copyrightⓒ2003 W 33 SoC Architecture 5. Programmable processor components in SoC I 3.3 Multiplier [1] 1/2 Low-cost multiplication hardware 32-bit results for multiply and multiply-accumulate Recently not used Shift and add: the barrel shifter and ALU to generate a 2-bit product in each cycle → 16 cycles in worst case Early termination logic Employ modified booth’s algorithm (radix-4) Copyrightⓒ2003 34 SoC Architecture 5. Programmable processor components in SoC I 2/2 High-performance multiplication initialization for MLA 64-bit results for multiply and multiply-accumulate Employ 32x8 multiplier 4 layers of carry-save adder array, each handling two multiplier bits Multiply eight bits per cycle 4 cycles in worst case Early termination logic registers Rs >> 8 bits/cycle Rm rotate s um and carry 8 bits/cycle carry-save adders partial sum partial carry ALU (add partials) Copyrightⓒ2003 35 SoC Architecture 5. Programmable processor components in SoC I 목차 1. 2. 3. 4. Introduction Processor architecture Organization Processor cores 1. 2. 3. 4. 5. 5. 6. 7. Architecture evolutions ARM7 Thumb family ARM9 family ARM9E family X-Scale ARM development environment IP solutions ARM applications Copyrightⓒ2003 36 SoC Architecture 5. Programmable processor components in SoC I 4. Processor Cores 4.1 Architecture evolutions Copyrightⓒ2003 37 SoC Architecture 5. Programmable processor components in SoC I 4.2 ARM7 Thumb family [7] 1/4 ARM7 Thumb family(v4T) Low-power, 32bit RISC cores optimized for cost and powersensitive applications 3 stage pipeline Unified bus interface Copyrightⓒ2003 38 SoC Architecture 5. Programmable processor components in SoC I 2/4 ARM7TDMI Base integer core (Hard macro cell) a 3 volt compatible rework of the ARM6 32-bit integer core Low power, fully static design 3-stage pipeline Unified bus interface The Thumb 16bit compressed instruction set On-chip Debug support [1] Interface for direct connection to Embedded Trace Macrocell JTAG interface unit Enhanced Multiplier with yielding a full 64 bit result Embedded-ICE hardware to give on-chip breakpoint and watchpoint support Copyrightⓒ2003 39 SoC Architecture 5. Programmable processor components in SoC I ARM7TDMI-S A synthesizable version of the ARM7TDMI Delivered as a high-level language module The core can be synthesized with reduced functionality 3/4 ARM720T macrocell High-performance processor for systems requiring full virtual memory management and protected execution spaces. Additional features 8K unified cache Memory Management Unit Write buffer AMBA AHB bus interface Copyrightⓒ2003 40 SoC Architecture ARM7EJ-S Enhanced core ARM v5TEJ Jazelle technology 4/4 hardware acceleration in the execution of Java byte-code DSP extensions 5. Programmable processor components in SoC I 16bit data operations Saturating, signed arithmetic Enhanced MAC operations Performance Copyrightⓒ2003 41 SoC Architecture 5. Programmable processor components in SoC I 4.4 ARM9 family [8] 1/4 ARM9 family(v4T) Copyrightⓒ2003 42 SoC Architecture 5. Programmable processor components in SoC I 2/4 ARM9 family (v4T) Very high-performance, low power optimized 32-bit RISC cores for wide variety of cost and power-sensitive applications ARM and Thumb instruction sets 5-stage pipeline Up to 300 MIPS (Dhrystone 2.1) in a typical 0.13mm process Single 32-bit AMBA interconnect interface MMU supporting virtual memory system Harvard architecture 8-entry Write buffer Copyrightⓒ2003 43 SoC Architecture ARM920T and ARM922T macrocell 5. Programmable processor components in SoC I 3/4 To support platform OS such as Linux 16k I-cache and 16k D-cache (ARM920T) or 8k I-cache and 8k D-cache (ARM922T) MMU AMBA bus interface Embedded Trace Macrocell ARM940T Applications such as DSL modem chipset 4k I-cache and 4k D-cache Protection unit rather than MMU Copyrightⓒ2003 44 SoC Architecture 5. Programmable processor components in SoC I 4/4 Performance Copyrightⓒ2003 45 SoC Architecture 5. Programmable processor components in SoC I 4.5 ARM9E family [10] Copyrightⓒ2003 1/4 46 SoC Architecture 5. Programmable processor components in SoC I ARM 9E family (v5TE) 2/4 Single core solutions for microcontroller, DSP and Java applications Synthesizable soft IP 5-stage integer pipeline Harvard architecture ARM, Thumb and DSP instruction sets ARM Jazelle technology for Java acceleration (ARM926EJ-S) Up to 300 MIPS (Dhrystone 2.1) in a typical 0.13µm process Integrated real-time trace and debug support Optional VFP9 coprocessor for floating-point operation High-performance AHB system Memory management unit 16-entry write buffer Copyrightⓒ2003 47 SoC Architecture 5. Programmable processor components in SoC I 3/4 The DSP extensions Single cycle 16x16 and 32x16 MAC (multiply-accumulate) operation Enhanced saturation arithmetic behavior and performance Tightly Coupled Memory TCMs are intended for storing real-time code and data Access to TCMs are deterministic and do not incur access penalties Cache preloads instructions Copyrightⓒ2003 48 SoC Architecture 5. Programmable processor components in SoC I 4.8 XScale 1/2 Intel ARM v5TE architecture Intel superpipelined RISC Technology 7-stage interger pipeline MAC pipeline with early terminateion 8-stage memoy pipeline Branch target buffer (BTB) Seperated cache & MMU 32k I-cache, 32k D-cache Multiply-Accumulate Coprocessor provides 40-bit accumulation of 16x16, dual 16x16(SIMD), 16x32 signed multiplies Copyrightⓒ2003 49 SoC Architecture 5. Programmable processor components in SoC I 4.8 XScale Clock and Power management 2/2 supports dynamic clock and voltage scaling Performance monitoring unit two 32-bit event and one 32-bit clock counter Copyrightⓒ2003 50 SoC Architecture 5. Programmable processor components in SoC I 목차 1. 2. 3. 4. 5. 6. 7. Introduction Processor architecture Organization Processor cores ARM development environment IP solutions ARM applications Copyrightⓒ2003 51 SoC Architecture 5. Programmable processor components in SoC I 5 ARM development environment ARM Developer Suite (ADS) Integrated Development Environment (IDE) Codewarrior IDE: edit, compilation, … AXD debugger: GUI debug environment ARMulator (Software emulator) Debug Hardware Multi-ICE JTAG-based In-Circuit Emulator Controls EmbeddedICE-RE and ETM logic MultiTrace Traces port analyzer unit passively Collects information from ETM Copyrightⓒ2003 52 SoC Architecture 5. Programmable processor components in SoC I 목차 1. 2. 3. 4. 5. 6. Introduction Processor architecture Organization Processor cores ARM development environment IP solutions 1. 2. 7. AMBA PrimeCell Peripherals ARM applications Copyrightⓒ2003 53 SoC Architecture 5. Programmable processor components in SoC I 6. IP Solutions [14] 6.1 AMBA The de facto Standard for On-Chip Bus AMBA is an open standard on-chip bus specification The Advanced High-performance Bus (AHB) Connect high-performance system modules Single clock edge Support burst and split transactions Centrally multiplexed bus scheme AHB-Lite Multi-layer AHB A subset of full AHB specification Single bus master is used Multiple bus masters The Advanced Peripheral Bus (APB) Simpler bus protocol designed for peripherals Connection to the system bus via a bridge Copyrightⓒ2003 54 SoC Architecture 5. Programmable processor components in SoC I 6.2 PrimeCell Peripherals [14] Re-usable soft IP macrocells developed to enable the rapid assembly of SoC designs Ready to use, fully verified and compliant with the AMBA on chip bus standard Fully packaged, ready to use soft IP macrocells Rapid and easy integration into AMBAbased SoC designs Royalty-free license for single or multiple use Supplied in VHDL and Verilog HDL with synthesis scripts Software device drivers are included as source code Copyrightⓒ2003 55 SoC Architecture 5. Programmable processor components in SoC I 목차 1. 2. 3. 4. 5. 6. 7. Introduction Processor architecture Organization Processor cores ARM development environment IP solutions ARM applications 1. The Psion Series 5MX Copyrightⓒ2003 56 SoC Architecture 5. Programmable processor components in SoC I 7.1 The Psion Series 5MX DRAM 1/2 ROM Flash PSU ARM7100 ADC PC cards infrared IrDA Tx/Rx digitizing tablet RS232 640 x 240 LCD audio codec keyboard Copyrightⓒ2003 57 SoC Architecture 5. Programmable processor components in SoC I 7.2 The Psion Series 5MX 2/2 ARM710a ARM7100 MMU ARM7 core 8 Kbyte cache LCD controller interrupt controller AMBA control 3.6864 MHz clock PLL power mgt. counter/ timers 32.786 KHz external bus control address (28) data (32) RTC osc. UART DRA(13) FIFOs codec i/f Copyrightⓒ2003 DRAM controller RAS, CAS (8) WE , OE(2) sync serial expansion parallel I/O PSU control 58 SoC Architecture 5. Programmable processor components in SoC I Summary The Advanced RISC machine Thumb instruction set Enhanced RISC architecture Simple hardware but effective instruction sets High-density code on ARM cores ARM offers a wide range of processor cores ARM Ltd., provides designers with fully integrated development environment ARM cores are widely used in embedded markets Copyrightⓒ2003 59 SoC Architecture 5. Programmable processor components in SoC I References [1] steve furber, "ARM system-on-chip architecture 2nd. ed.", Addison wesley, 2000 [2] "ARM Architecture Reference Manual", ARM Ltd., June 2000 [3] "ARM Architecture Version 6 (v6) White Paper", ARM Ltd., January 2002 [4] "Improving ARM code density and performance", ARM Ltd., June 2003 [5] Application Note 04 "Programmer's Model for Big-Endian ARM", ARM Ltd., December 1994 [6] "ARM9TDMI Rev3 Technical Reference Manual", March 2000 [7] "ARM7 Family Flyer", ARM Ltd. [8] "ARM9 Family Flyer", ARM Ltd. [9] "ARM9E Family Flyer", ARM Ltd. [10] "ARM10E Family Flyer", ARM Ltd. [11] "White paper - The ARM11 Microarchitecture", ARM Ltd., April 2002 Copyrightⓒ2003 60 SoC Architecture 5. Programmable processor components in SoC I References [12] "Intel XScale Microarchitecture Technical Summary", Intel Co., 2000 [13] "ARM debugging techniques for embedded systems using real-time software trace", ARM Ltd. 2002 [14] "ARM Product Backgrounder", ARM Ltd., November 2003 [15] "Samsung communication MCU S3C4510", Samsung Electronics Co., Ltd. [16] "Sceptre HPE EDGE/GPRS/GSM High performance solution", Agere Systems Inc., November 2003 [17] Comparison between CISC and RISC, Yi Gao, Shilang Tang, Zhongli Ding, University of Maryland [18] ARM application note 29, "Interfacing a memory system to the ARM7TDMI without using AMBA", ARM Ltd., December 1995 [19] "Profile guided selection of ARM and Thumb instructions", Arvind Krishnaswamy, Rajiv Gupta, The University of Arizona Copyrightⓒ2003 61