ARM Core Architecture Common ARM Cortex Core In the case of ARM-based microcontrollers a company named ARM Holdings designs the core and licenses it to manufacturers like ST (or NXP, Apple, Samsung, Qualcomm, HP, etc). Result: The same CPU (core) but different Peripherals Why CPU Registers ? Load parameters into CPU registers, execute operation and Store result in memory. User’s view of CORTEX M3 CPU Registers • General Purpose registers R0 to R12 are used to store data and addresses. • Stack Pointer (SP) controls stack memory processes such as PUSH and POP • Link Register (LR) is used to store the return program counter value when a subroutine is called • Program Counter (PC) stores address of the current instruction • Program Status Register (PSR) stores flags (i.e. single bits) that represent the current status of CPU ARM organization (simplified) Load & Store Architecture ALL Data operations performed on CPU Registers only Instruction Pipeline Stalled Pipeline To speed up execution (partially) unroll the loops: do{ ++ count; ++ count; while( count <21)}; Memory Map CPU registers are accessed by their name (e.g. r1, RL). Other registers, including memories, are accessed by address values. Addresses are 32 bit unsigned integers and form a linear 4G (232) long space called Memory Map. Not all addresses are implemented !!! Storage Location Accessed Memory (STM32) Flash ROM Code 0x00000000 - 0x1FFFFFFF – 0x3FFFFFFF Data SRAM 0x20000000 Peripheral Data & Configuration On-chip Hardware 0x40000000 - 0x5FFFFFFFF External Ext Memory 0x60000000 - 0x9FFFFFFF Microcontroller Programming Paradigm 1. Decide what peripheral you want to use 2. Look in datasheet for the registers to enable and configure 3. Set bits in the registers to make peripheral behave the way you want 4. GOTO 1 Instruction Set Architecture - ISA Microprocessor’s ISA provides programmer’s overview of the: • data types used • type of machine instructions • different addressing modes and memory access • CPU registers and their role • accessing peripherals • operation of interrupts x86 ISA example The x86 ISA processors will all run the SAME user code. Because the PC architecture will be implemented in hardware in different ways the processor’s performance (such as execution speed and power consumption ) will differ widely. • The high end performance processors (e.g. Intel Core i8 )will specific hardware components to perform many common operations, several fast memory caches, fast data and address buses, several parallel CPUs etc. • medium end performance processors e.g .Intel Core 2 will have microprogrammed components that are slower of the order of magnitude •low end performance processors e.g. Intel Atom will have to perform some CPU operations (e.g. memory access and arithmetic) using software routines rather than in hardware. CORTEX-M Machine Instructions (THUMB II) •Instruction length can be either 16 or 32 bits •Instruction fetched from Flash Memory or SRAM • Instruction Memory Alignment in Half Word (16 bits) • Reduced Instruction Set Computer (RISC) There are about 100 instructions Most instructions offer option of conditional execution Machine Instruction Types Instruction Type Mnemonic Example Description Frequency 80% Data Movement MOV STR LDR R ← R SRAM ← R R ← SRAM Arithmetic & Logic Ops ADD R Flow Control B Branch to address ← Rs1 + Rs2 10% 10% Anatomy of Assembler Commands Instructions: translated to binary machine code by Assembler <label> start opcode ldr ldr adds str Directives: provide Assembler with information e.g. values of symbols and code/data address placement: <label> x y x y directive equ equ dcb db <dest, src1, .....> R2, # 0x3456789A R1, x R0, R1, R2 R0, y parameter 0x20000004 0x20000008 0xdeadbeef ; comment ; R2 ← 0x3456789A ; R1 ← x ; R0 ← R1 + R2 ; value in R0 stored in memory address y ; comment ; x ≡ 0x20000004 ; y ≡ 0x20000008 ; stores value 0xdeadbeef in memory address x ; reserves 4 bytes in SRAM starting at y NOTES: •Labels ALWAYS represent memory addresses. They can be symbolic names or numbers •Opcode is a user mnemonic for the binary coding of the instruction type •The number of parameters varies between 0 to 3, depending on the instruction type •Directives direct the assembler . They do not translate to machine code !!! •Some (pseudo) instructions convert to a sequence of machine instructions . Addressing Memory Depending on what information is encoded (included) in the machine code: • Immediate Constant Data e.g. MOV R2, #0x12 • Register CPU Register e.g. MOV R2, R1 • Register Indirect Memory Address in Register + constant offset e.g. MOV R2 , [R1 + 0x25] Note : THUMB 2 branch instructions use PC indirect (relative) addressing mode with respect to the current value of PC (?) e.g. loop b loop => b ? 1 The true offset is 0x00 but the bit #0 has been set by assembler for THUMB2 execution. Loading Registers with Constants ONLY limited support for Immediate Addressing for small and special constants EXAMPLE: MOV R1, #0xEF ; OK MOV R1, #0xDEADBEEF; LDR R1, =0xDEADBEEF; ?? OK small enough constant to fit into 16 bit Machine code format may not be possible for all 32 bit constants pseudo-instruction LDR R1,=const will generate PC relative addressing instruction format with reference to the constant stored at a nearby ROM location Example: Using Pseudo Instruction Machine code generated (on the right) for the LDR pseudo-instruction (left) LDR R1,=0xDEADBEEF loop add R1, R1, #1 B loop → LDR R1, [?, #4 ] B ? -2 DC32 0xDEADBEEF Notes: • Current value of PC Counter ? always points at the address of the current instruction. • The value ?+ #4 in the LDR instruction points to the numerical constant 0xDEADBEEF placed in the ROM • The - 2 offset in the branch instruction is 0xFE in 2’s complement but because the bit #0 is also set for THUMB2 mode the actual machine code parameter is 0xFD . Addressing Memory- Example ORG 0x00000204 MOV LDR R0, #my_const R1, = MY_LOOP MOV ADD JNE STR R2, my_data R3, R0, R2 [ R1] R2, [R1 +0x4] ; Directive: ; Set the Assembler Memory Counter to 0x00000204 ; Immediate Address: R0 <= 0x0000000FF ; Pseudo-Instruction using PC relative addressing ; value 0x10004000 loaded into R1 ; Direct address: 0xDEADBEEF loaded into R2 ; Register addressing: R3 <= R0 + R2 ; Register Relative: Jump on Non-Zero to 0x10004000 ; Register Relative with Offset: ; store R2 at Memory Address in 0x10004004 ; Data Definitions MY_LOOP my_const my_data EQU EQU ORG DC32 my_result DC32 END 0x10004000 0x000000FF 0x 20004000 0xDEADBEEF ; Directive: Define label (text replacement) ; Directive: Set the Assembler Memory Counter ; Directive: ; Reserve and initiate 4 bytes at RAM at current ; memory counter value ; Assembler Directive: ; Reserve 32 bits at RAM at current memory counter value ; Directive : End of Source File Source Code R1 - internal Register stores value of variable counter PC - Program Counter Register In Little Endian Storage Change of Flow Control The code needs a comparison and a jump instruction. Question: How many jumps altogether ? How many comparisons ? Answ: 22 Answ: 22 Conditional Branch Instruction BLT.N Branch If Less Than OFFSET = 0xFC PC <- PC + OFFSET or PC <- 10C – 4 = 108 Code Optimization Optimized Original do{ while(counter < 21 ){ ++ counter; } ++ counter; while(counter <21)}; Optimized code is faster because the loop has one less instruction (no need for unconditional jump instruction). Jumps Slow down execution because of the break of the Pipeline.