ARM (Advanced RISC Machine) – Introduction Hyung Chul Park 박형철 Contents • Introduction • RISC architecture • ARM design philosophy 2 What is ARM? • ARM : Advanced RISC Machine • The first ARM processor, developed at Acorn Computers Limited 1983-1985. • The ARM is a 32-bit reduced instruction set computer (RISC) instruction set architecture (ISA) developed by ARM Holdings. 3 Instruction set types (1) • Reduced Instruction Set Computers(RISC) – Single-cycle execution – Pipeline execution • Starting a second instruction before the first one has finished – A large register bank of 32-bit registers, all of which can be used for any purpose, to allow the load-store architecture to operate efficiently – A load-store architecture where instructions that process data operate only on registers and are separate from instructions that access memory – A fixed (32 bit) instruction size with few formats. – Hard-wired instruction decode logic 4 Instruction set types (2) • Complex Instruction Set Computers(CISC) – Intended to reduce the semantic gap. • The distance, in implementation terms, between a high-level language construct and a machine instruction. – Single instruction procedure entries and exits – Variable length instruction sets with many formats – Complex sequence of operations over many clock cycles – Processors based on CISC were sold on the sophistication and number of their addressing modes, data types, etc – Developed in the 1970’s when computers had slow main memory so processors were controlled by faster ROMs – Frequently used operations are drawn from ROM as microcode sequences rather than having instructions pulled from main memory 5 Instruction set types (3) • Execution time = IC x CPI x CT • IC ( Instruction Count ) – No. of instructions • CPI ( Clocks per Instruction ) – Average no. of clocks for 1 instruction execution • CT (Clock Times) – Clock period • CISC : reducing IC • RISC : reducing CPI and CT 6 Instruction set types (4) • CISC emphasizes hardware complexity. • RISC emphasizes compiler complexity. 7 RISC architecture • Advantages – A smaller die size • A simpler processor requires fewer transistors and less silicon area. – A shorter development time • Less design effort and therefore a lower cost – A higher performance • Simpler instructions are executed faster. • Disadvantages – Poor code density compared with CISC’s – Doesn’t execute x86 code 8 History • Founded in November 1990 – Spun out of Acorn Computers • Designs the ARM range of RISC processor cores • Licenses ARM core designs to semiconductor partners who fabricate and sell to their customers. – ARM does not fabricate silicon itself • Also develop technologies to assist with the design-in of the ARM architecture – Software tools, boards, debug hardware, application – software, bus architectures, peripherals etc 9 ARM design philosophy - Overview • Simplicity is the key philosophy behind the ARM design. – Reduced power consumption is essential feature for portable embedded systems such as mobile phones and personal digital assistants (PDAs). • Small amount of silicon die area – RISC machine with small instruction set and consequently a small gate count. – low cost and low power consumption – For SoC solution, more area is available for specialized peripherals. • Hardware debug technology – software engineers (firmware engineers) can examine the processor state while the processor is executing code. – Reducing development costs and time. 10 ARM design philosophy – Instruction set for embedded systems • The ARM instruction set differs from the pure RISC architecture for embedded applications. – Variable cycle execution for certain instructions – Inline barrel shifter – Thumb 16-bit instruction set – Conditional execution – Enhanced digital signal processing (DSP) instructions 11 ARM core based embedded system architecture • Main H/W components – ARM processor – Controller – Bus – Peripherals 12 System-on-chip (SoC) • System – A collection of all kinds of components and/or subsystems that are appropriately interconnected to perform the specified functions for end users. 13 System-on-chip (SoC) • SoC – Complex IC that integrates the major functional elements of a complete end-product into a single chip or chipset. 14 System-on-chip (SoC) • Characteristics of SoC – Various function IPs – Bus-based system – Supporting multi-master • Function IPs for SoC – Microprocessor – Special-purpose processor (DSP processor, TI C5x) – On-chip memory – Hardware accelerating function units • MPEG, JPEG, MP3 decoding – Peripheral interfaces (GPIO, SPI, I2C, UART) 15 SoC : Development history First stage Second stage Third stage 16 ARM core family Application Cores ARM720T ARM920T Embedded Cores ARM7EJ-S ARM7TDMI Secure Cores SecureCore SC100 SecureCore SC110 ARM7TDMI-S SecurCore SC200 ARM926EJ-S ARM946E-S SecurCore SC210 ARM1020E ARM922T • T: Thumb, 16-bit instruction set • D: On-chip debug support, – • Enabling the processor to halt in response to a debug request. M: Enhanced multiplier – Full 64-bit result, high performance • I: Embedded ICE hardware • T2: Thumb-2 ARM966E-S • S: Synthesizable code ARM1022 ARM968E-S • ARM1026EJ-S ARM996HS E: Enhanced DSP instruction set ARM11 MPCore ARM1026EJ-S • J: JAVA support, Jazelle ARM1136J(F)-S ARM1156T2(F)-S • F: Floating point unit ARM1176JZ(F)-S ARM Cortex-M0 • ARM Cortex-A8 ARM Cortex-M1 H: Handshake, clock-less design for synchronous or asynchronous design ARM Cortex-A9 ARM Cortex-M3 17 ARM Cortex-A15 ARM Cortex-M4 ARM core family : Cores and architecture ver. 18 ARM architecture versions • • Version 1 – The first ARM processor, developed at Acorn Computers Limited 1983-1985 – 32-bit data bus, 26-bit address space – no multiply or coprocessor support Version 2 – • 32-bit result multiply and coprocessor Version 2a – Coprocessor 15 as the system control coprocessor to manage Cache – Add the atomic load and store (SWP) instruction • • Synchronization of shared memory for multi-master system. Version 3 – First ARM processor designed by ARM Limited (1990) – 32-bit addressing, separate current program status register (CPSR) and saved program status registers (SPSRs) – ARM6 (macro cell), ARM60 (stand-alone processor) – ARM600 (an integrated CPU with on-chip cache, MMU, write buffer) – ARM610 (used in Apple Newton) – Add the undefined and abort modes to allow coprocessor emulation and virtual memory support in supervisor mode 19 ARM architecture versions • Version 3M – • • Version 4 – Add the signed, unsigned half-word and signed byte load and store instructions – Reserve some of SWI space for architecturally defined operation – System mode is introduced Version 4T – • 16-bit Thumb compressed form of the instruction set is introduced Version 5T – • Introduce the signed and unsigned multiply and multiply-accumulate instructions that generate the full 64-bit result Introduced recently, a superset of version 4T adding the BLX, CLZ and BRK instructions Version 5TE – Add the signal processing instruction set extension 20 ARM architecture versions • Version 6 – Media processing extensions (SIMD) • 2x faster MPEG4 encode/decode • 2x faster audio DSP – Improved cache architecture • Physically addressed caches • Reduction in cache flush/refill • Reduced overhead in context switches – Improved exception and interrupt handling • Important for improving performance in real-time tasks – Unaligned and mixed-endian data support • Simpler data sharing, application porting and saves memory 21 ARM architecture versions • Version 7 22 ARM cortex • The ARM Cortex family includes processors based on the three distinct profiles of the ARMv7 architecture. • The A profile for sophisticated, high-end applications running open and complex operating systems • The R profile for real-time systems • The M profile optimized for cost-sensitive and microcontroller applications 23 ARM bus technology • Embedded systems use own bus technologies rather than those designed for x86 PCs. – PCI (Peripheral Component Interconnect) bus • Most common PC bus • External or off-chip – Embedded devices use an on-chip bus • ARM AMBA (Advanced Microcontroller Bus Architecture) • Altera AVALON • IBM CORECONNECT • Silicore Corporation’s WISHBONE 24 ARM bus technology • Classes – Bus master • It can initiate a data transfer. – Bus slave • It only respond to a transfer request from a bus master • Bus protocol : AMBA (Advanced Microcontroller Bus Architecture) – It was introduced in 1996. – Buses • ARM System Bus (ASB) • ARM Peripheral Bus (APB) • ARM High Performance Bus (AHB) 25 AMBA system • AMBA system components – A high speed bus (ASB or AHB) for CPU. • Advanced High-Performance Bus (AHB) – Provides high-bandwidth communication channel between embedded processor (ARM, MIPS,AVR, DSP 320xx, 8051, etc.) – and high performance peripherals/ hardware accelerators (ASICs MPEG, color LCD, etc), – on-chip SRAM, on-chip external memory interface, and APB bridge. • Advanced System Bus (ASB) – Fast memory and DMA. – A bus for peripherals (APB), connected via a bridge to the high-speed bus. • Optimized for minimal power consumption and reduced interface complexity to support peripheral functions • Architecture – (Single) MASTER (bridge) – (Multi) SLAVE 26 AMBA AHB master • Can initiate read and write information by providing address and control information. 27 AMBA AHB slave • Responds to a read and write operation within a given address-space range. • Signals back to the active bus master the success, failure or waiting of the data transfer. 28 AMBA arbiter • The role – to control which master has access to the bus. • Every bus master has a REQUEST /GRANT interface to the arbiter and the arbiter in turn uses a prioritization scheme. 29 AMBA decoder • Decodes the address of each transfer, and provides a select signal for the involved slave. • A single centralized decoder is required in all AHB implementations, to provide a select signal, HSELx, for each slave on the bus. 30 AMBA AHB bus interconnection • Exemplary design with 3 masters and 4 slaves AHB Protocol is based on a central multiplexer interconnection scheme. All bus masters send their request in form of address and control signals. The arbiter chooses one master. – The address and control signals are routed to all slaves. The decoder selects the signals from the slave that is involved in the transfer with the bus master. 31