ARM (Advanced RISC Machine) – Introduction

advertisement
ARM (Advanced RISC Machine) –
Introduction
Hyung Chul Park
박형철
Contents
• Introduction
• RISC architecture
• ARM design philosophy
2
What is ARM?
• ARM : Advanced RISC Machine
• The first ARM processor, developed at Acorn Computers
Limited 1983-1985.
• The ARM is a 32-bit reduced instruction set computer
(RISC) instruction set architecture (ISA) developed by ARM
Holdings.
3
Instruction set types (1)
• Reduced Instruction Set Computers(RISC)
– Single-cycle execution
– Pipeline execution
• Starting a second instruction before the first one has finished
– A large register bank of 32-bit registers, all of which can be used for
any purpose, to allow the load-store architecture to operate efficiently
– A load-store architecture where instructions that process data
operate only on registers and are separate from instructions that
access memory
– A fixed (32 bit) instruction size with few formats.
– Hard-wired instruction decode logic
4
Instruction set types (2)
• Complex Instruction Set Computers(CISC)
– Intended to reduce the semantic gap.
• The distance, in implementation terms, between a high-level language
construct and a machine instruction.
– Single instruction procedure entries and exits
– Variable length instruction sets with many formats
– Complex sequence of operations over many clock cycles
– Processors based on CISC were sold on the sophistication and
number of their addressing modes, data types, etc
– Developed in the 1970’s when computers had slow main memory so
processors were controlled by faster ROMs
– Frequently used operations are drawn from ROM as microcode
sequences rather than having instructions pulled from main memory
5
Instruction set types (3)
• Execution time = IC x CPI x CT
• IC ( Instruction Count )
– No. of instructions
• CPI ( Clocks per Instruction )
– Average no. of clocks for 1
instruction execution
• CT (Clock Times)
– Clock period
• CISC : reducing IC
• RISC : reducing CPI and CT
6
Instruction set types (4)
• CISC emphasizes hardware complexity.
• RISC emphasizes compiler complexity.
7
RISC architecture
• Advantages
– A smaller die size
• A simpler processor requires fewer transistors and less silicon area.
– A shorter development time
• Less design effort and therefore a lower cost
– A higher performance
• Simpler instructions are executed faster.
• Disadvantages
– Poor code density compared with CISC’s
– Doesn’t execute x86 code
8
History
•
Founded in November 1990
– Spun out of Acorn Computers
•
Designs the ARM range of RISC processor cores
•
Licenses ARM core designs to semiconductor partners who
fabricate and sell to their customers.
– ARM does not fabricate silicon itself
•
Also develop technologies to assist with the design-in of the
ARM architecture
– Software tools, boards, debug hardware, application
– software, bus architectures, peripherals etc
9
ARM design philosophy - Overview
• Simplicity is the key philosophy behind the ARM design.
– Reduced power consumption is essential feature for portable
embedded systems such as mobile phones and personal digital
assistants (PDAs).
• Small amount of silicon die area
– RISC machine with small instruction set and consequently a small
gate count.
– low cost and low power consumption
– For SoC solution, more area is available for specialized peripherals.
• Hardware debug technology
– software engineers (firmware engineers) can examine the processor
state while the processor is executing code.
– Reducing development costs and time.
10
ARM design philosophy – Instruction set for
embedded systems
• The ARM instruction set differs from the pure RISC
architecture for embedded applications.
– Variable cycle execution for certain instructions
– Inline barrel shifter
– Thumb 16-bit instruction set
– Conditional execution
– Enhanced digital signal processing (DSP) instructions
11
ARM core based embedded system architecture
• Main H/W
components
– ARM processor
– Controller
– Bus
– Peripherals
12
System-on-chip (SoC)
• System
– A collection of all kinds of components and/or subsystems that are
appropriately interconnected to perform the specified functions for
end users.
13
System-on-chip (SoC)
• SoC
– Complex IC that integrates the major functional elements of a
complete end-product into a single chip or chipset.
14
System-on-chip (SoC)
• Characteristics of SoC
– Various function IPs
– Bus-based system
– Supporting multi-master
• Function IPs for SoC
– Microprocessor
– Special-purpose processor (DSP processor, TI C5x)
– On-chip memory
– Hardware accelerating function units
• MPEG, JPEG, MP3 decoding
– Peripheral interfaces (GPIO, SPI, I2C, UART)
15
SoC : Development history
First
stage
Second
stage
Third
stage
16
ARM core family
Application
Cores
ARM720T
ARM920T
Embedded Cores
ARM7EJ-S
ARM7TDMI
Secure Cores
SecureCore
SC100
SecureCore
SC110
ARM7TDMI-S
SecurCore
SC200
ARM926EJ-S
ARM946E-S
SecurCore
SC210
ARM1020E
ARM922T
•
T: Thumb, 16-bit instruction set
•
D: On-chip debug support,
–
•
Enabling the processor to halt in
response to a debug request.
M: Enhanced multiplier
–
Full 64-bit result, high performance
•
I: Embedded ICE hardware
•
T2: Thumb-2
ARM966E-S
•
S: Synthesizable code
ARM1022
ARM968E-S
•
ARM1026EJ-S
ARM996HS
E: Enhanced DSP instruction
set
ARM11 MPCore
ARM1026EJ-S
•
J: JAVA support, Jazelle
ARM1136J(F)-S
ARM1156T2(F)-S
•
F: Floating point unit
ARM1176JZ(F)-S
ARM Cortex-M0
•
ARM Cortex-A8
ARM Cortex-M1
H: Handshake, clock-less design
for synchronous or asynchronous
design
ARM Cortex-A9
ARM Cortex-M3
17
ARM Cortex-A15
ARM Cortex-M4
ARM core family : Cores and architecture ver.
18
ARM architecture versions
•
•
Version 1
–
The first ARM processor, developed at Acorn Computers Limited 1983-1985
–
32-bit data bus, 26-bit address space
–
no multiply or coprocessor support
Version 2
–
•
32-bit result multiply and coprocessor
Version 2a
–
Coprocessor 15 as the system control coprocessor to manage Cache
–
Add the atomic load and store (SWP) instruction
•
•
Synchronization of shared memory for multi-master system.
Version 3
–
First ARM processor designed by ARM Limited (1990)
–
32-bit addressing, separate current program status register (CPSR) and saved program status
registers (SPSRs)
–
ARM6 (macro cell), ARM60 (stand-alone processor)
–
ARM600 (an integrated CPU with on-chip cache, MMU, write buffer)
–
ARM610 (used in Apple Newton)
–
Add the undefined and abort modes to allow coprocessor emulation and virtual memory support in
supervisor mode
19
ARM architecture versions
•
Version 3M
–
•
•
Version 4
–
Add the signed, unsigned half-word and signed byte load and store instructions
–
Reserve some of SWI space for architecturally defined operation
–
System mode is introduced
Version 4T
–
•
16-bit Thumb compressed form of the instruction set is introduced
Version 5T
–
•
Introduce the signed and unsigned multiply and multiply-accumulate instructions that generate the
full 64-bit result
Introduced recently, a superset of version 4T adding the BLX, CLZ and BRK instructions
Version 5TE
–
Add the signal processing instruction set extension
20
ARM architecture versions
•
Version 6
– Media processing extensions (SIMD)
• 2x faster MPEG4 encode/decode
• 2x faster audio DSP
– Improved cache architecture
• Physically addressed caches
• Reduction in cache flush/refill
• Reduced overhead in context switches
– Improved exception and interrupt handling
• Important for improving performance in real-time tasks
– Unaligned and mixed-endian data support
• Simpler data sharing, application porting and saves memory
21
ARM architecture versions
• Version 7
22
ARM cortex
• The ARM Cortex family includes processors based on the
three distinct profiles of the ARMv7 architecture.
• The A profile for sophisticated, high-end applications
running open and complex operating systems
• The R profile for real-time systems
• The M profile optimized for cost-sensitive and
microcontroller applications
23
ARM bus technology
• Embedded systems use own bus technologies rather than
those designed for x86 PCs.
– PCI (Peripheral Component Interconnect) bus
• Most common PC bus
• External or off-chip
– Embedded devices use an on-chip bus
• ARM AMBA (Advanced Microcontroller Bus Architecture)
• Altera AVALON
• IBM CORECONNECT
• Silicore Corporation’s WISHBONE
24
ARM bus technology
• Classes
– Bus master
• It can initiate a data transfer.
– Bus slave
• It only respond to a transfer request from a bus master
• Bus protocol : AMBA (Advanced Microcontroller Bus
Architecture)
– It was introduced in 1996.
– Buses
• ARM System Bus (ASB)
• ARM Peripheral Bus (APB)
• ARM High Performance Bus (AHB)
25
AMBA system
•
AMBA system components
– A high speed bus (ASB or AHB) for CPU.
• Advanced High-Performance Bus (AHB)
– Provides high-bandwidth communication channel between embedded processor (ARM,
MIPS,AVR, DSP 320xx, 8051, etc.)
– and high performance peripherals/ hardware accelerators (ASICs MPEG, color LCD,
etc),
– on-chip SRAM, on-chip external memory interface, and APB bridge.
• Advanced System Bus (ASB)
– Fast memory and DMA.
– A bus for peripherals (APB), connected via a bridge to the high-speed bus.
• Optimized for minimal power consumption and reduced interface complexity to
support peripheral functions
• Architecture
– (Single) MASTER (bridge)
– (Multi) SLAVE
26
AMBA AHB master
• Can initiate read and write information by providing address
and control information.
27
AMBA AHB slave
• Responds to a read and write operation within a given
address-space range.
• Signals back to the active bus master the success, failure or
waiting of the data transfer.
28
AMBA arbiter
• The role
– to control which master has access to the bus.
• Every bus master has a REQUEST /GRANT interface to the
arbiter and the arbiter in turn uses a prioritization scheme.
29
AMBA decoder
• Decodes the address of each transfer, and provides a select
signal for the involved slave.
• A single centralized decoder is required in all AHB
implementations, to provide a select signal, HSELx, for each
slave on the bus.
30
AMBA AHB bus interconnection
• Exemplary design with 3 masters and 4 slaves
AHB Protocol is based on a central
multiplexer interconnection scheme.
All bus masters send their request in
form of address and control signals.
The arbiter chooses one master.
– The address and control
signals are routed to all
slaves.
The decoder selects the signals from
the slave that is involved in the
transfer with the bus master.
31
Download