ARM Processor Overview Prof. Taeweon Suh Computer Science Education Korea University ARM (www.arm.com) 2 Korea Univ ARM Source: 2008 Embedded SW Insight Conference 3 Korea Univ ARM Partners Source: 2008 Embedded SW Insight Conference 4 Korea Univ ARM (as of 2008) Source: 2008 Embedded SW Insight Conference 5 Korea Univ ARM Brief • ARM architecture was first developed in the 1980s by Acorn • Spin off from Acron in 1990 • Released ARM6 in early 1992 • … • As of 2013, ARM architecture is the most widely used 32-bit ISA in terms of quantity produced • In 2010 alone, 6.1 billion ARM-based processors shipped, representing 95% of smartphones 35% of digital TV and set-top boxes 10% of mobile computers Source: Wikipedia 6 Korea Univ ARM Architecture • ARM is RISC (Reduced Instruction Set Computer) x86 ISA is based on CISC (Complex Instruction Set Computer) even though x86 internally implements RISClike microcode and pipelining • Suitable for embedded systems Very small die size (low price) Low power consumption (longer battery life) 7 Korea Univ ARM Processor Portfolio Source: 2008 Embedded SW Insight Conference 8 Korea Univ Product Code • • • • • • • T: Thumb T2: Thumb-2 Enhancement D: Debug M: Multiplier I: Embedded ICE (In-Circuit Emulation) E: Enhanced DPS Extension J: Jazelle Direct execution of 8-bit Java bytecode in hardware • S: Synthesizable core • Z: Should be TrustZone? 9 Korea Univ ARM Cortex Series • ARM Cortex-A family: • ARM Cortex-R family: Cortex-A15 ...2.5GHz x1-4 Cortex-A9 Cortex-A8 x1-4 Cortex-A5 Embedded processors for real-time signal processing, control applications 1-2 Cortex-R7 1-2 Cortex-R5 Cortex-R4 • ARM Cortex-M family: Microcontroller-oriented processors for MCU, ASSP, and SoC applications Cortex-M4 SC300 Cortex-M3 Cortex-M1 SC000 Unparalleled Applicability Applications processors for featurerich OS and 3rd party applications x1-4 Cortex-M0 12k gates... Source: ARM Processor Portfolio 2011 10 Korea Univ ARMv7-A ACP: Accelerator Coherency Port www.arm.com SCU: Snoop Control Unit 11 Korea Univ ARM Processor Brief #pipeline stages Frequency Architecture Process ARM6 (1992) 3 ~33MHz ARMv3 1.2μm ARM7TDMI 3 ~70MHz ARMv4 0.13nm ARM920T 5 ~400MHz ARMv4 90nm ARM1136J 8 ~1Ghz ARMv6 65nm Cortex-A9 8~11 (OoO) ~2GHz ARMv7 32nm Cortex-A15 15~24 (OoO) ~2.5GHz ARMv7 22nm OOO: Out Of Order 12 Korea Univ ARM Instruction Overview • ARM is a RISC machine, so the instruction length is fixed In ARM mode, instructions are 32-bit wide In Thumb mode, instructions are 16-bit wide • Most ARM instructions can be conditionally executed It means that they have their normal effect only if the N (Negative), Z (Zero), C (Carry) and V (Overflow) flags in the CPSR satisfy a condition specified in the instruction • If the flags do not satisfy this condition, the instruction acts as a NOP (No Operation) • In other words, the instruction has no effect and advances to the next instruction 13 Korea Univ ARM Instructions • For the complete instruction set, refer to the “ARM Architecture Reference Manual” • We are going to cover essential and important instructions in this course If you completely understand one CPU, it is pretty straightforward to understand other CPUs 14 Korea Univ Essential Instructions • Instruction categories Data processing instructions: add, sub, cmp, and, or Memory access instructions: ldr, str Branch instructions: b, bl Miscellaneous instructions: Real-PC system CPU FSB (Front-Side Bus) Main Memory (DDR) Address Bus Simplified ARM CPU North Bridge Data Bus DMI (Direct Media I/F) Memory (Instruction, data) South Bridge 15 Korea Univ A Memory Hierarchy DDR3 HDD 2nd Gen. Core i7 (2011) 16 Korea Univ A Memory Hierarchy lower level higher level On-Chip Components CPU Core Reg File L1I (Instr ) L2 L1D (Data) Speed (cycles): ½’s 1’s Size (bytes): 100’s 10K’s Cost: L3 Main Memory (DRAM) Secondary Storage (Disk) 10’s 100’s 10,000’s M’s G’s T’s highest lowest 17 Korea Univ ARM Registers • ARM has 31 general purpose registers and 6 status registers (32-bit each) 18 Korea Univ ARM Registers • Unbanked registers: R0 ~ R7 Each of them refers to the same 32-bit physical register in all processor modes. They are completely general-purpose registers, with no special uses implied by the architecture • Banked registers: R8 ~ R14 R8 ~ R12 have no dedicated special purposes • FIQ mode has dedicated registers for fast interrupt processing R13 and R14 are dedicated for special purposes for each mode 19 Korea Univ R13, R14, and R15 • Some registers in ARM are used for special purposes R15 == PC (Program Counter) • x86 uses a terminology called IP (Instruction Pointer) R14 == LR (Link Register) R13 == SP (Stack Pointer) 20 Korea Univ ARM9 Register File • A set of architectural (programmervisible) registers inside CPU is called register file Register File src1 addr Register file can be implemented with flip-flops or SRAM ARM9 register file has 16 32-bit registers • 3 read ports • 2 write ports Register file access is much faster than main memory or cache because there are a very limited number of registers and they reside inside CPU So, compilers strive to use the register file when translating high-level code to assembly code src2 addr src3 addr dst1 addr write1 data dst2 addr write2 data 4 32 bits R0 4 src1 data 32 src2 data 32 src3 data R1 R2 4 R3 4 … 32 4 R14 32 R15 write1 21 32 write2 Korea Univ CPSR • Current Program Status Register (CPSR) is accessible in all modes • Contains all condition flags, interrupt disable bits, the current processor mode 22 Korea Univ CPSR in ARM 23 Korea Univ CPSR bits 24 Korea Univ CPSR bits • ARM: 32-bit mode • Thumb: 16-bit mode • Jazelle: Special mode for JAVA acceleration 25 Korea Univ ARM Instruction Format Arithmetic and Logical Instructions Memory Access Instructions (Load/Store) Branch Instructions Software Interrupt Instruction 26 Korea Univ ARM Instruction Fields 32-bit opcode operation code Rn 4-bits first source register Rm 4-bits second source register Rs 4-bits third source register Rd 4-bits destination register shift 2-bits shift type* shift amount 5-bits shift by how many bits * Shift type: Arithmetic, logical (left, right) 27 Korea Univ Condition Field 28 Korea Univ Overview of ARM Operation • ARM arithmetic in assembly form add R3, R1, R5 # R3 = R1 + R5 R1 and R5 are source operands, and R3 is destination # indicate a comment, so assembler ignores it • Operands of arithmetic instructions come from special locations called registers inside CPU or from the immediate field in instructions All CPUs (x86, PowerPC, MIPS, ARM…) have registers inside • Registers are visible to the programmers ARM has a register file consisting of 16 registers 29 Korea Univ Simplified Version of CPU Internal add Registers R3, R1, R5 # R3 = R1 + R5 ARM CPU 32 bits R0 R1 R2 R1 Memory Address Bus R3 … + R3 add R3, R1, R5 R5 Data Bus R14 R15 30 Korea Univ ARM Register Convention Name R0~R3 • • R13 R14 R15 Usage Arguments passed to a subroutine Results returned from a subroutine Stack pointer Link Register Program Counter 31 Korea Univ Backup Slides 32 Korea Univ ARM Processor Family Source: Wikipedia 33 Korea Univ NEON & VFP www.arm.com 34 Korea Univ Register Mapping • NEON Advanced SIMD and VFP use the same register set 35 Korea Univ NEON • Advanced SIMD (Single Instruction Multiple Data) • It supports 8, 16, 32 and 64-bit integer and singleprecision (32-bit) floating point data • Up to 16 operations at the same time 1B x 16 = 16B (= 1 quad word) http://en.wikipedia.org/wiki/ARM_architecture 36 Korea Univ VFP (Vector Floating Point) • FPU (Floating Point Unit) coprocessor extension to ARM architecture • Single-precision and double-precision FP computation Compliant with IEEE 754-1985 • Intended to support execution of short “vector mode” instructions, but operated on “each” vector element sequentially Thus, did not offer the performance of true SIMD This vector mode was thus removed shortly after its introduction, to be replaced with the much more powerful NEON Advanced SIMD http://en.wikipedia.org/wiki/ARM_architecture 37 Korea Univ ARM Processor Selector www.arm.com 38 Korea Univ